Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-10-28 Thread Andrey Gura
continue of previous mail...

The same method rethrows an exception which will lead to failure of an
metrics exporter. The method should return some numeric value which
indicates the failure.

On Wed, Oct 28, 2020 at 3:09 PM Andrey Gura  wrote:
>
> Hi there,
>
> I accidentally stumbled upon a potential performance problem in this commit.
>
> CacheGroupMetricImpls.getPagesLeftForReencryption method contains at
> least two problems:
>
>  - Relatively major: In order to calculate a value for one metric the
> method has O(N) complexity (N is number of partitions). It isn't good.
> Better approach is using some precalculated or estimated value during
> re-encryption process and just return this value.
>  - Major: For each partition in this method PageStore.exists() will be
> called. This invocation leads to N calls to the file system (may be
> cached, may be not, we can't just hope). So with a default affinity
> configuration this method will touch the file system 1024 times per
> one metrics value calculation. Just increase dramatism and multiply
> 1024 on the number of cache groups existing on a node.
>
> Finally, we have auxiliary functionality (metrics) which could affect
> the whole node (and potentially cluster) behavior.
>
> Please, fix this problem and be more careful in the future.
>
> On Fri, Oct 23, 2020 at 12:46 PM Pavel Pereslegin  wrote:
> >
> > Hello folks,
> >
> > thanks to everyone who joined the review, greatly appreciate your
> > helpful comments.
> >
> > If there is no objection, we will merge this patch [1] shortly.
> >
> > [1] https://github.com/apache/ignite/pull/7941
> >
> > пн, 5 окт. 2020 г. в 15:30, Maksim Stepachev :
> > >
> > > Hi,
> > >
> > > I'm going to do it.
> > >
> > > сб, 3 окт. 2020 г. в 21:47, Alex Plehanov :
> > >
> > > > Hello guys,
> > > >
> > > > I've finished the review and approved the patch.
> > > > Anybody else would like to review it?
> > > >
> > > > пн, 28 сент. 2020 г. в 11:38, Pavel Pereslegin :
> > > >
> > > > > Hello, Maksim!
> > > > >
> > > > > I am currently working on a review notes from Alexey Plekhanov, will
> > > > > let you know when I finish.
> > > > >
> > > > > пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev <
> > > > maksim.stepac...@gmail.com
> > > > > >:
> > > > > >
> > > > > > Hi, Pavel.
> > > > > >
> > > > > > As I see, the ticket [
> > > > https://issues.apache.org/jira/browse/IGNITE-12843
> > > > > ]
> > > > > > is "PATCH AVAILABLE". Is this ticket finished?
> > > > > >
> > > > > > чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin :
> > > > > >
> > > > > > > Hello all.
> > > > > > >
> > > > > > > I'm working on TDE cache group key rotation [1] and I have a 
> > > > > > > couple
> > > > of
> > > > > > > questions about partition re-encryption.
> > > > > > >
> > > > > > > As described in the wiki [2], the process of re-encryption at the
> > > > > > > moment consists of sequentially marking memory pages as dirty, 
> > > > > > > this
> > > > > > > process looks not resource-intensive.
> > > > > > > Do you think it is necessary to do this in a multithreaded mode or
> > > > > > > single thread is enough?
> > > > > > > (We started testing re-encryption on dedicated servers (Xeon 
> > > > > > > E5-2680
> > > > > > > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy =
> > > > > > > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a 
> > > > > > > result,
> > > > > > > single-threaded encryption loaded disk within 30%. At the same 
> > > > > > > time,
> > > > > > > the total re-encryption speed was around 60 MB/s, which allows one
> > > > > > > node to re-encrypt 1 TB of data in about 5 hours, and it seems 
> > > > > > > that
> > > > > > > this performance is enough.)
> > > > > > >
> > > > > > > The second question is about the approach to storing the
> > > > re-encryption
> > > > > > > status.
> > > > > > > At the moment, the re-encryption status includes two parameters - 
> > > > > > > the
> > > > > > > total number of pages in the partition at the time of the start of
> > > > > > > re-encryption (int) and the index of the last re-encrypted page
> > > > (int).
> > > > > > > These 8 bytes are stored in the metapage on the checkpoint (which
> > > > > > > ensures that if the checkpoint does not happen, we will continue 
> > > > > > > the
> > > > > > > process from the last page written to disk).
> > > > > > > However, if multithread partition scanning does not make sense, 
> > > > > > > then
> > > > > > > it seems that it is possible to change the implementation and 
> > > > > > > don't
> > > > > > > change the metapage structure. Store only the "pointer" of the
> > > > > > > partition (and the cache group) in the metastore and scan in 
> > > > > > > strict
> > > > > > > order.
> > > > > > > The approach with storing the status in the metapage of the 
> > > > > > > partition
> > > > > > > seems to me more flexible, stable and has a number of advantages 
> > > > > > > over
> > > > > > > the "pointer" approach:
> > > > > > > 1. Since we saving the 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-10-28 Thread Pavel Pereslegin
Andrey,
thanks for your comment.

I will fix this problem shortly.

ср, 28 окт. 2020 г. в 15:10, Andrey Gura :
>
> Hi there,
>
> I accidentally stumbled upon a potential performance problem in this commit.
>
> CacheGroupMetricImpls.getPagesLeftForReencryption method contains at
> least two problems:
>
>  - Relatively major: In order to calculate a value for one metric the
> method has O(N) complexity (N is number of partitions). It isn't good.
> Better approach is using some precalculated or estimated value during
> re-encryption process and just return this value.
>  - Major: For each partition in this method PageStore.exists() will be
> called. This invocation leads to N calls to the file system (may be
> cached, may be not, we can't just hope). So with a default affinity
> configuration this method will touch the file system 1024 times per
> one metrics value calculation. Just increase dramatism and multiply
> 1024 on the number of cache groups existing on a node.
>
> Finally, we have auxiliary functionality (metrics) which could affect
> the whole node (and potentially cluster) behavior.
>
> Please, fix this problem and be more careful in the future.
>
> On Fri, Oct 23, 2020 at 12:46 PM Pavel Pereslegin  wrote:
> >
> > Hello folks,
> >
> > thanks to everyone who joined the review, greatly appreciate your
> > helpful comments.
> >
> > If there is no objection, we will merge this patch [1] shortly.
> >
> > [1] https://github.com/apache/ignite/pull/7941
> >
> > пн, 5 окт. 2020 г. в 15:30, Maksim Stepachev :
> > >
> > > Hi,
> > >
> > > I'm going to do it.
> > >
> > > сб, 3 окт. 2020 г. в 21:47, Alex Plehanov :
> > >
> > > > Hello guys,
> > > >
> > > > I've finished the review and approved the patch.
> > > > Anybody else would like to review it?
> > > >
> > > > пн, 28 сент. 2020 г. в 11:38, Pavel Pereslegin :
> > > >
> > > > > Hello, Maksim!
> > > > >
> > > > > I am currently working on a review notes from Alexey Plekhanov, will
> > > > > let you know when I finish.
> > > > >
> > > > > пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev <
> > > > maksim.stepac...@gmail.com
> > > > > >:
> > > > > >
> > > > > > Hi, Pavel.
> > > > > >
> > > > > > As I see, the ticket [
> > > > https://issues.apache.org/jira/browse/IGNITE-12843
> > > > > ]
> > > > > > is "PATCH AVAILABLE". Is this ticket finished?
> > > > > >
> > > > > > чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin :
> > > > > >
> > > > > > > Hello all.
> > > > > > >
> > > > > > > I'm working on TDE cache group key rotation [1] and I have a 
> > > > > > > couple
> > > > of
> > > > > > > questions about partition re-encryption.
> > > > > > >
> > > > > > > As described in the wiki [2], the process of re-encryption at the
> > > > > > > moment consists of sequentially marking memory pages as dirty, 
> > > > > > > this
> > > > > > > process looks not resource-intensive.
> > > > > > > Do you think it is necessary to do this in a multithreaded mode or
> > > > > > > single thread is enough?
> > > > > > > (We started testing re-encryption on dedicated servers (Xeon 
> > > > > > > E5-2680
> > > > > > > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy =
> > > > > > > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a 
> > > > > > > result,
> > > > > > > single-threaded encryption loaded disk within 30%. At the same 
> > > > > > > time,
> > > > > > > the total re-encryption speed was around 60 MB/s, which allows one
> > > > > > > node to re-encrypt 1 TB of data in about 5 hours, and it seems 
> > > > > > > that
> > > > > > > this performance is enough.)
> > > > > > >
> > > > > > > The second question is about the approach to storing the
> > > > re-encryption
> > > > > > > status.
> > > > > > > At the moment, the re-encryption status includes two parameters - 
> > > > > > > the
> > > > > > > total number of pages in the partition at the time of the start of
> > > > > > > re-encryption (int) and the index of the last re-encrypted page
> > > > (int).
> > > > > > > These 8 bytes are stored in the metapage on the checkpoint (which
> > > > > > > ensures that if the checkpoint does not happen, we will continue 
> > > > > > > the
> > > > > > > process from the last page written to disk).
> > > > > > > However, if multithread partition scanning does not make sense, 
> > > > > > > then
> > > > > > > it seems that it is possible to change the implementation and 
> > > > > > > don't
> > > > > > > change the metapage structure. Store only the "pointer" of the
> > > > > > > partition (and the cache group) in the metastore and scan in 
> > > > > > > strict
> > > > > > > order.
> > > > > > > The approach with storing the status in the metapage of the 
> > > > > > > partition
> > > > > > > seems to me more flexible, stable and has a number of advantages 
> > > > > > > over
> > > > > > > the "pointer" approach:
> > > > > > > 1. Since we saving the total number of pages at the re-encryption
> > > > > > > startup - we will not scan extra pages that may be added to the
> > > > > > > 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-10-28 Thread Andrey Gura
Hi there,

I accidentally stumbled upon a potential performance problem in this commit.

CacheGroupMetricImpls.getPagesLeftForReencryption method contains at
least two problems:

 - Relatively major: In order to calculate a value for one metric the
method has O(N) complexity (N is number of partitions). It isn't good.
Better approach is using some precalculated or estimated value during
re-encryption process and just return this value.
 - Major: For each partition in this method PageStore.exists() will be
called. This invocation leads to N calls to the file system (may be
cached, may be not, we can't just hope). So with a default affinity
configuration this method will touch the file system 1024 times per
one metrics value calculation. Just increase dramatism and multiply
1024 on the number of cache groups existing on a node.

Finally, we have auxiliary functionality (metrics) which could affect
the whole node (and potentially cluster) behavior.

Please, fix this problem and be more careful in the future.

On Fri, Oct 23, 2020 at 12:46 PM Pavel Pereslegin  wrote:
>
> Hello folks,
>
> thanks to everyone who joined the review, greatly appreciate your
> helpful comments.
>
> If there is no objection, we will merge this patch [1] shortly.
>
> [1] https://github.com/apache/ignite/pull/7941
>
> пн, 5 окт. 2020 г. в 15:30, Maksim Stepachev :
> >
> > Hi,
> >
> > I'm going to do it.
> >
> > сб, 3 окт. 2020 г. в 21:47, Alex Plehanov :
> >
> > > Hello guys,
> > >
> > > I've finished the review and approved the patch.
> > > Anybody else would like to review it?
> > >
> > > пн, 28 сент. 2020 г. в 11:38, Pavel Pereslegin :
> > >
> > > > Hello, Maksim!
> > > >
> > > > I am currently working on a review notes from Alexey Plekhanov, will
> > > > let you know when I finish.
> > > >
> > > > пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev <
> > > maksim.stepac...@gmail.com
> > > > >:
> > > > >
> > > > > Hi, Pavel.
> > > > >
> > > > > As I see, the ticket [
> > > https://issues.apache.org/jira/browse/IGNITE-12843
> > > > ]
> > > > > is "PATCH AVAILABLE". Is this ticket finished?
> > > > >
> > > > > чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin :
> > > > >
> > > > > > Hello all.
> > > > > >
> > > > > > I'm working on TDE cache group key rotation [1] and I have a couple
> > > of
> > > > > > questions about partition re-encryption.
> > > > > >
> > > > > > As described in the wiki [2], the process of re-encryption at the
> > > > > > moment consists of sequentially marking memory pages as dirty, this
> > > > > > process looks not resource-intensive.
> > > > > > Do you think it is necessary to do this in a multithreaded mode or
> > > > > > single thread is enough?
> > > > > > (We started testing re-encryption on dedicated servers (Xeon E5-2680
> > > > > > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy =
> > > > > > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a 
> > > > > > result,
> > > > > > single-threaded encryption loaded disk within 30%. At the same time,
> > > > > > the total re-encryption speed was around 60 MB/s, which allows one
> > > > > > node to re-encrypt 1 TB of data in about 5 hours, and it seems that
> > > > > > this performance is enough.)
> > > > > >
> > > > > > The second question is about the approach to storing the
> > > re-encryption
> > > > > > status.
> > > > > > At the moment, the re-encryption status includes two parameters - 
> > > > > > the
> > > > > > total number of pages in the partition at the time of the start of
> > > > > > re-encryption (int) and the index of the last re-encrypted page
> > > (int).
> > > > > > These 8 bytes are stored in the metapage on the checkpoint (which
> > > > > > ensures that if the checkpoint does not happen, we will continue the
> > > > > > process from the last page written to disk).
> > > > > > However, if multithread partition scanning does not make sense, then
> > > > > > it seems that it is possible to change the implementation and don't
> > > > > > change the metapage structure. Store only the "pointer" of the
> > > > > > partition (and the cache group) in the metastore and scan in strict
> > > > > > order.
> > > > > > The approach with storing the status in the metapage of the 
> > > > > > partition
> > > > > > seems to me more flexible, stable and has a number of advantages 
> > > > > > over
> > > > > > the "pointer" approach:
> > > > > > 1. Since we saving the total number of pages at the re-encryption
> > > > > > startup - we will not scan extra pages that may be added to the
> > > > > > partition later.
> > > > > > 2. We can move partitions between nodes and re-encryption should
> > > > > > continue from a certain point on the new node.
> > > > > > 3. If a partition is (re)created during cache group re-encryption, 
> > > > > > it
> > > > > > will not be re-encrypted (since its re-encryption status will be
> > > reset
> > > > > > and all data is encrypted with the latest encryption key after
> > > > > > (re)creation.
> > > > > >
> > > > > > Do 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-10-23 Thread Pavel Pereslegin
Hello folks,

thanks to everyone who joined the review, greatly appreciate your
helpful comments.

If there is no objection, we will merge this patch [1] shortly.

[1] https://github.com/apache/ignite/pull/7941

пн, 5 окт. 2020 г. в 15:30, Maksim Stepachev :
>
> Hi,
>
> I'm going to do it.
>
> сб, 3 окт. 2020 г. в 21:47, Alex Plehanov :
>
> > Hello guys,
> >
> > I've finished the review and approved the patch.
> > Anybody else would like to review it?
> >
> > пн, 28 сент. 2020 г. в 11:38, Pavel Pereslegin :
> >
> > > Hello, Maksim!
> > >
> > > I am currently working on a review notes from Alexey Plekhanov, will
> > > let you know when I finish.
> > >
> > > пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev <
> > maksim.stepac...@gmail.com
> > > >:
> > > >
> > > > Hi, Pavel.
> > > >
> > > > As I see, the ticket [
> > https://issues.apache.org/jira/browse/IGNITE-12843
> > > ]
> > > > is "PATCH AVAILABLE". Is this ticket finished?
> > > >
> > > > чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin :
> > > >
> > > > > Hello all.
> > > > >
> > > > > I'm working on TDE cache group key rotation [1] and I have a couple
> > of
> > > > > questions about partition re-encryption.
> > > > >
> > > > > As described in the wiki [2], the process of re-encryption at the
> > > > > moment consists of sequentially marking memory pages as dirty, this
> > > > > process looks not resource-intensive.
> > > > > Do you think it is necessary to do this in a multithreaded mode or
> > > > > single thread is enough?
> > > > > (We started testing re-encryption on dedicated servers (Xeon E5-2680
> > > > > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy =
> > > > > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result,
> > > > > single-threaded encryption loaded disk within 30%. At the same time,
> > > > > the total re-encryption speed was around 60 MB/s, which allows one
> > > > > node to re-encrypt 1 TB of data in about 5 hours, and it seems that
> > > > > this performance is enough.)
> > > > >
> > > > > The second question is about the approach to storing the
> > re-encryption
> > > > > status.
> > > > > At the moment, the re-encryption status includes two parameters - the
> > > > > total number of pages in the partition at the time of the start of
> > > > > re-encryption (int) and the index of the last re-encrypted page
> > (int).
> > > > > These 8 bytes are stored in the metapage on the checkpoint (which
> > > > > ensures that if the checkpoint does not happen, we will continue the
> > > > > process from the last page written to disk).
> > > > > However, if multithread partition scanning does not make sense, then
> > > > > it seems that it is possible to change the implementation and don't
> > > > > change the metapage structure. Store only the "pointer" of the
> > > > > partition (and the cache group) in the metastore and scan in strict
> > > > > order.
> > > > > The approach with storing the status in the metapage of the partition
> > > > > seems to me more flexible, stable and has a number of advantages over
> > > > > the "pointer" approach:
> > > > > 1. Since we saving the total number of pages at the re-encryption
> > > > > startup - we will not scan extra pages that may be added to the
> > > > > partition later.
> > > > > 2. We can move partitions between nodes and re-encryption should
> > > > > continue from a certain point on the new node.
> > > > > 3. If a partition is (re)created during cache group re-encryption, it
> > > > > will not be re-encrypted (since its re-encryption status will be
> > reset
> > > > > and all data is encrypted with the latest encryption key after
> > > > > (re)creation.
> > > > >
> > > > > Do you think single-threaded mode is enough?
> > > > > Is it better to keep the re-encryption status in the metapage or
> > store
> > > > > the "pointer" in the metastore?
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12843
> > > > > [2]
> > > > >
> > >
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption
> > > > >
> > > > > пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin :
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I'll expand the answer a bit about calculating CRC, the problem is
> > > not
> > > > > > that it is calculated twice, but that now for encrypted pages,
> > > > > > EncryptedFileIO checks physical integrity, and FilePageStore checks
> > > > > > the correctness of the encryption key, but from my point of view,
> > it
> > > > > > should be vice versa - the lower (delegated) FileIO should check
> > the
> > > > > > physical integrity and EncryptedFileIO should check the correctness
> > > of
> > > > > > the encryption key.
> > > > > >
> > > > > > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin :
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > > > > > > filePageStore - what should we do with this?
> > > > > > >

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-10-05 Thread Maksim Stepachev
Hi,

I'm going to do it.

сб, 3 окт. 2020 г. в 21:47, Alex Plehanov :

> Hello guys,
>
> I've finished the review and approved the patch.
> Anybody else would like to review it?
>
> пн, 28 сент. 2020 г. в 11:38, Pavel Pereslegin :
>
> > Hello, Maksim!
> >
> > I am currently working on a review notes from Alexey Plekhanov, will
> > let you know when I finish.
> >
> > пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev <
> maksim.stepac...@gmail.com
> > >:
> > >
> > > Hi, Pavel.
> > >
> > > As I see, the ticket [
> https://issues.apache.org/jira/browse/IGNITE-12843
> > ]
> > > is "PATCH AVAILABLE". Is this ticket finished?
> > >
> > > чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin :
> > >
> > > > Hello all.
> > > >
> > > > I'm working on TDE cache group key rotation [1] and I have a couple
> of
> > > > questions about partition re-encryption.
> > > >
> > > > As described in the wiki [2], the process of re-encryption at the
> > > > moment consists of sequentially marking memory pages as dirty, this
> > > > process looks not resource-intensive.
> > > > Do you think it is necessary to do this in a multithreaded mode or
> > > > single thread is enough?
> > > > (We started testing re-encryption on dedicated servers (Xeon E5-2680
> > > > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy =
> > > > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result,
> > > > single-threaded encryption loaded disk within 30%. At the same time,
> > > > the total re-encryption speed was around 60 MB/s, which allows one
> > > > node to re-encrypt 1 TB of data in about 5 hours, and it seems that
> > > > this performance is enough.)
> > > >
> > > > The second question is about the approach to storing the
> re-encryption
> > > > status.
> > > > At the moment, the re-encryption status includes two parameters - the
> > > > total number of pages in the partition at the time of the start of
> > > > re-encryption (int) and the index of the last re-encrypted page
> (int).
> > > > These 8 bytes are stored in the metapage on the checkpoint (which
> > > > ensures that if the checkpoint does not happen, we will continue the
> > > > process from the last page written to disk).
> > > > However, if multithread partition scanning does not make sense, then
> > > > it seems that it is possible to change the implementation and don't
> > > > change the metapage structure. Store only the "pointer" of the
> > > > partition (and the cache group) in the metastore and scan in strict
> > > > order.
> > > > The approach with storing the status in the metapage of the partition
> > > > seems to me more flexible, stable and has a number of advantages over
> > > > the "pointer" approach:
> > > > 1. Since we saving the total number of pages at the re-encryption
> > > > startup - we will not scan extra pages that may be added to the
> > > > partition later.
> > > > 2. We can move partitions between nodes and re-encryption should
> > > > continue from a certain point on the new node.
> > > > 3. If a partition is (re)created during cache group re-encryption, it
> > > > will not be re-encrypted (since its re-encryption status will be
> reset
> > > > and all data is encrypted with the latest encryption key after
> > > > (re)creation.
> > > >
> > > > Do you think single-threaded mode is enough?
> > > > Is it better to keep the re-encryption status in the metapage or
> store
> > > > the "pointer" in the metastore?
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-12843
> > > > [2]
> > > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption
> > > >
> > > > пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin :
> > > > >
> > > > > Hello,
> > > > >
> > > > > I'll expand the answer a bit about calculating CRC, the problem is
> > not
> > > > > that it is calculated twice, but that now for encrypted pages,
> > > > > EncryptedFileIO checks physical integrity, and FilePageStore checks
> > > > > the correctness of the encryption key, but from my point of view,
> it
> > > > > should be vice versa - the lower (delegated) FileIO should check
> the
> > > > > physical integrity and EncryptedFileIO should check the correctness
> > of
> > > > > the encryption key.
> > > > >
> > > > > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin :
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > > > > > filePageStore - what should we do with this?
> > > > > >
> > > > > > We need to calculate the CRC of encrypted data, because we may be
> > > > > > using the wrong encryption key to decrypt data, in which case we
> > will
> > > > > > not understand if the physical integrity is violated or the wrong
> > > > > > encryption key is used.
> > > > > >
> > > > > > > 9. Question - How do we optimize when we can check that this
> > page is
> > > > > > > already encrypted by parallel loading? Maybe we should do this
> in
> > > > Phase 4?
> > > > 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-10-03 Thread Alex Plehanov
Hello guys,

I've finished the review and approved the patch.
Anybody else would like to review it?

пн, 28 сент. 2020 г. в 11:38, Pavel Pereslegin :

> Hello, Maksim!
>
> I am currently working on a review notes from Alexey Plekhanov, will
> let you know when I finish.
>
> пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev  >:
> >
> > Hi, Pavel.
> >
> > As I see, the ticket [https://issues.apache.org/jira/browse/IGNITE-12843
> ]
> > is "PATCH AVAILABLE". Is this ticket finished?
> >
> > чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin :
> >
> > > Hello all.
> > >
> > > I'm working on TDE cache group key rotation [1] and I have a couple of
> > > questions about partition re-encryption.
> > >
> > > As described in the wiki [2], the process of re-encryption at the
> > > moment consists of sequentially marking memory pages as dirty, this
> > > process looks not resource-intensive.
> > > Do you think it is necessary to do this in a multithreaded mode or
> > > single thread is enough?
> > > (We started testing re-encryption on dedicated servers (Xeon E5-2680
> > > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy =
> > > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result,
> > > single-threaded encryption loaded disk within 30%. At the same time,
> > > the total re-encryption speed was around 60 MB/s, which allows one
> > > node to re-encrypt 1 TB of data in about 5 hours, and it seems that
> > > this performance is enough.)
> > >
> > > The second question is about the approach to storing the re-encryption
> > > status.
> > > At the moment, the re-encryption status includes two parameters - the
> > > total number of pages in the partition at the time of the start of
> > > re-encryption (int) and the index of the last re-encrypted page (int).
> > > These 8 bytes are stored in the metapage on the checkpoint (which
> > > ensures that if the checkpoint does not happen, we will continue the
> > > process from the last page written to disk).
> > > However, if multithread partition scanning does not make sense, then
> > > it seems that it is possible to change the implementation and don't
> > > change the metapage structure. Store only the "pointer" of the
> > > partition (and the cache group) in the metastore and scan in strict
> > > order.
> > > The approach with storing the status in the metapage of the partition
> > > seems to me more flexible, stable and has a number of advantages over
> > > the "pointer" approach:
> > > 1. Since we saving the total number of pages at the re-encryption
> > > startup - we will not scan extra pages that may be added to the
> > > partition later.
> > > 2. We can move partitions between nodes and re-encryption should
> > > continue from a certain point on the new node.
> > > 3. If a partition is (re)created during cache group re-encryption, it
> > > will not be re-encrypted (since its re-encryption status will be reset
> > > and all data is encrypted with the latest encryption key after
> > > (re)creation.
> > >
> > > Do you think single-threaded mode is enough?
> > > Is it better to keep the re-encryption status in the metapage or store
> > > the "pointer" in the metastore?
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-12843
> > > [2]
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption
> > >
> > > пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin :
> > > >
> > > > Hello,
> > > >
> > > > I'll expand the answer a bit about calculating CRC, the problem is
> not
> > > > that it is calculated twice, but that now for encrypted pages,
> > > > EncryptedFileIO checks physical integrity, and FilePageStore checks
> > > > the correctness of the encryption key, but from my point of view, it
> > > > should be vice versa - the lower (delegated) FileIO should check the
> > > > physical integrity and EncryptedFileIO should check the correctness
> of
> > > > the encryption key.
> > > >
> > > > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin :
> > > > >
> > > > > Hello,
> > > > >
> > > > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > > > > filePageStore - what should we do with this?
> > > > >
> > > > > We need to calculate the CRC of encrypted data, because we may be
> > > > > using the wrong encryption key to decrypt data, in which case we
> will
> > > > > not understand if the physical integrity is violated or the wrong
> > > > > encryption key is used.
> > > > >
> > > > > > 9. Question - How do we optimize when we can check that this
> page is
> > > > > > already encrypted by parallel loading? Maybe we should do this in
> > > Phase 4?
> > > > >
> > > > > To do this, we need to store the encryption key ID in memory (at
> > > > > least), but this is not easy to do right now without breaking
> binary
> > > > > compatibility.
> > > > >
> > > > > > 7. Question -the current implementation does not use the
> throttling
> > > that
> > > > > > is implemented in PDS. Users should 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-09-28 Thread Pavel Pereslegin
Hello, Maksim!

I am currently working on a review notes from Alexey Plekhanov, will
let you know when I finish.

пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev :
>
> Hi, Pavel.
>
> As I see, the ticket [https://issues.apache.org/jira/browse/IGNITE-12843]
> is "PATCH AVAILABLE". Is this ticket finished?
>
> чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin :
>
> > Hello all.
> >
> > I'm working on TDE cache group key rotation [1] and I have a couple of
> > questions about partition re-encryption.
> >
> > As described in the wiki [2], the process of re-encryption at the
> > moment consists of sequentially marking memory pages as dirty, this
> > process looks not resource-intensive.
> > Do you think it is necessary to do this in a multithreaded mode or
> > single thread is enough?
> > (We started testing re-encryption on dedicated servers (Xeon E5-2680
> > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy =
> > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result,
> > single-threaded encryption loaded disk within 30%. At the same time,
> > the total re-encryption speed was around 60 MB/s, which allows one
> > node to re-encrypt 1 TB of data in about 5 hours, and it seems that
> > this performance is enough.)
> >
> > The second question is about the approach to storing the re-encryption
> > status.
> > At the moment, the re-encryption status includes two parameters - the
> > total number of pages in the partition at the time of the start of
> > re-encryption (int) and the index of the last re-encrypted page (int).
> > These 8 bytes are stored in the metapage on the checkpoint (which
> > ensures that if the checkpoint does not happen, we will continue the
> > process from the last page written to disk).
> > However, if multithread partition scanning does not make sense, then
> > it seems that it is possible to change the implementation and don't
> > change the metapage structure. Store only the "pointer" of the
> > partition (and the cache group) in the metastore and scan in strict
> > order.
> > The approach with storing the status in the metapage of the partition
> > seems to me more flexible, stable and has a number of advantages over
> > the "pointer" approach:
> > 1. Since we saving the total number of pages at the re-encryption
> > startup - we will not scan extra pages that may be added to the
> > partition later.
> > 2. We can move partitions between nodes and re-encryption should
> > continue from a certain point on the new node.
> > 3. If a partition is (re)created during cache group re-encryption, it
> > will not be re-encrypted (since its re-encryption status will be reset
> > and all data is encrypted with the latest encryption key after
> > (re)creation.
> >
> > Do you think single-threaded mode is enough?
> > Is it better to keep the re-encryption status in the metapage or store
> > the "pointer" in the metastore?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-12843
> > [2]
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption
> >
> > пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin :
> > >
> > > Hello,
> > >
> > > I'll expand the answer a bit about calculating CRC, the problem is not
> > > that it is calculated twice, but that now for encrypted pages,
> > > EncryptedFileIO checks physical integrity, and FilePageStore checks
> > > the correctness of the encryption key, but from my point of view, it
> > > should be vice versa - the lower (delegated) FileIO should check the
> > > physical integrity and EncryptedFileIO should check the correctness of
> > > the encryption key.
> > >
> > > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin :
> > > >
> > > > Hello,
> > > >
> > > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > > > filePageStore - what should we do with this?
> > > >
> > > > We need to calculate the CRC of encrypted data, because we may be
> > > > using the wrong encryption key to decrypt data, in which case we will
> > > > not understand if the physical integrity is violated or the wrong
> > > > encryption key is used.
> > > >
> > > > > 9. Question - How do we optimize when we can check that this page is
> > > > > already encrypted by parallel loading? Maybe we should do this in
> > Phase 4?
> > > >
> > > > To do this, we need to store the encryption key ID in memory (at
> > > > least), but this is not easy to do right now without breaking binary
> > > > compatibility.
> > > >
> > > > > 7. Question -the current implementation does not use the throttling
> > that
> > > > > is implemented in PDS. Users should set the throughput such as 5 MB
> > per
> > > > > second, but not the timeout, packet size, or stream size.
> > > >
> > > > I've added a simple rate limiter for this.
> > > >
> > > > > 8. Question - why we add a lot of system properties?
> > > > >> Can you, please, list system properties that should be moved to the
> > configuration?
> > > >
> > > > It's about the 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-09-28 Thread Maksim Stepachev
Hi, Pavel.

As I see, the ticket [https://issues.apache.org/jira/browse/IGNITE-12843]
is "PATCH AVAILABLE". Is this ticket finished?

чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin :

> Hello all.
>
> I'm working on TDE cache group key rotation [1] and I have a couple of
> questions about partition re-encryption.
>
> As described in the wiki [2], the process of re-encryption at the
> moment consists of sequentially marking memory pages as dirty, this
> process looks not resource-intensive.
> Do you think it is necessary to do this in a multithreaded mode or
> single thread is enough?
> (We started testing re-encryption on dedicated servers (Xeon E5-2680
> 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy =
> CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result,
> single-threaded encryption loaded disk within 30%. At the same time,
> the total re-encryption speed was around 60 MB/s, which allows one
> node to re-encrypt 1 TB of data in about 5 hours, and it seems that
> this performance is enough.)
>
> The second question is about the approach to storing the re-encryption
> status.
> At the moment, the re-encryption status includes two parameters - the
> total number of pages in the partition at the time of the start of
> re-encryption (int) and the index of the last re-encrypted page (int).
> These 8 bytes are stored in the metapage on the checkpoint (which
> ensures that if the checkpoint does not happen, we will continue the
> process from the last page written to disk).
> However, if multithread partition scanning does not make sense, then
> it seems that it is possible to change the implementation and don't
> change the metapage structure. Store only the "pointer" of the
> partition (and the cache group) in the metastore and scan in strict
> order.
> The approach with storing the status in the metapage of the partition
> seems to me more flexible, stable and has a number of advantages over
> the "pointer" approach:
> 1. Since we saving the total number of pages at the re-encryption
> startup - we will not scan extra pages that may be added to the
> partition later.
> 2. We can move partitions between nodes and re-encryption should
> continue from a certain point on the new node.
> 3. If a partition is (re)created during cache group re-encryption, it
> will not be re-encrypted (since its re-encryption status will be reset
> and all data is encrypted with the latest encryption key after
> (re)creation.
>
> Do you think single-threaded mode is enough?
> Is it better to keep the re-encryption status in the metapage or store
> the "pointer" in the metastore?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-12843
> [2]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption
>
> пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin :
> >
> > Hello,
> >
> > I'll expand the answer a bit about calculating CRC, the problem is not
> > that it is calculated twice, but that now for encrypted pages,
> > EncryptedFileIO checks physical integrity, and FilePageStore checks
> > the correctness of the encryption key, but from my point of view, it
> > should be vice versa - the lower (delegated) FileIO should check the
> > physical integrity and EncryptedFileIO should check the correctness of
> > the encryption key.
> >
> > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin :
> > >
> > > Hello,
> > >
> > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > > filePageStore - what should we do with this?
> > >
> > > We need to calculate the CRC of encrypted data, because we may be
> > > using the wrong encryption key to decrypt data, in which case we will
> > > not understand if the physical integrity is violated or the wrong
> > > encryption key is used.
> > >
> > > > 9. Question - How do we optimize when we can check that this page is
> > > > already encrypted by parallel loading? Maybe we should do this in
> Phase 4?
> > >
> > > To do this, we need to store the encryption key ID in memory (at
> > > least), but this is not easy to do right now without breaking binary
> > > compatibility.
> > >
> > > > 7. Question -the current implementation does not use the throttling
> that
> > > > is implemented in PDS. Users should set the throughput such as 5 MB
> per
> > > > second, but not the timeout, packet size, or stream size.
> > >
> > > I've added a simple rate limiter for this.
> > >
> > > > 8. Question - why we add a lot of system properties?
> > > >> Can you, please, list system properties that should be moved to the
> configuration?
> > >
> > > It's about the background re-encryption properties, for now, it is:
> > > - re-encryption speed limit (in megabytes per second)
> > > - threads count used for re-encryption
> > > - count of pages in batch, processed under checkpoint lock
> > > - flag to completely disable background re-encryption
> > >
> > > > 11. We should remember about complicated test scenarios with failover
> > >

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-08-13 Thread Pavel Pereslegin
Hello all.

I'm working on TDE cache group key rotation [1] and I have a couple of
questions about partition re-encryption.

As described in the wiki [2], the process of re-encryption at the
moment consists of sequentially marking memory pages as dirty, this
process looks not resource-intensive.
Do you think it is necessary to do this in a multithreaded mode or
single thread is enough?
(We started testing re-encryption on dedicated servers (Xeon E5-2680
2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy =
CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result,
single-threaded encryption loaded disk within 30%. At the same time,
the total re-encryption speed was around 60 MB/s, which allows one
node to re-encrypt 1 TB of data in about 5 hours, and it seems that
this performance is enough.)

The second question is about the approach to storing the re-encryption status.
At the moment, the re-encryption status includes two parameters - the
total number of pages in the partition at the time of the start of
re-encryption (int) and the index of the last re-encrypted page (int).
These 8 bytes are stored in the metapage on the checkpoint (which
ensures that if the checkpoint does not happen, we will continue the
process from the last page written to disk).
However, if multithread partition scanning does not make sense, then
it seems that it is possible to change the implementation and don't
change the metapage structure. Store only the "pointer" of the
partition (and the cache group) in the metastore and scan in strict
order.
The approach with storing the status in the metapage of the partition
seems to me more flexible, stable and has a number of advantages over
the "pointer" approach:
1. Since we saving the total number of pages at the re-encryption
startup - we will not scan extra pages that may be added to the
partition later.
2. We can move partitions between nodes and re-encryption should
continue from a certain point on the new node.
3. If a partition is (re)created during cache group re-encryption, it
will not be re-encrypted (since its re-encryption status will be reset
and all data is encrypted with the latest encryption key after
(re)creation.

Do you think single-threaded mode is enough?
Is it better to keep the re-encryption status in the metapage or store
the "pointer" in the metastore?

[1] https://issues.apache.org/jira/browse/IGNITE-12843
[2] 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption

пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin :
>
> Hello,
>
> I'll expand the answer a bit about calculating CRC, the problem is not
> that it is calculated twice, but that now for encrypted pages,
> EncryptedFileIO checks physical integrity, and FilePageStore checks
> the correctness of the encryption key, but from my point of view, it
> should be vice versa - the lower (delegated) FileIO should check the
> physical integrity and EncryptedFileIO should check the correctness of
> the encryption key.
>
> пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin :
> >
> > Hello,
> >
> > > 10. Question - CRC is read in two places encryptionFileIO and
> > > filePageStore - what should we do with this?
> >
> > We need to calculate the CRC of encrypted data, because we may be
> > using the wrong encryption key to decrypt data, in which case we will
> > not understand if the physical integrity is violated or the wrong
> > encryption key is used.
> >
> > > 9. Question - How do we optimize when we can check that this page is
> > > already encrypted by parallel loading? Maybe we should do this in Phase 4?
> >
> > To do this, we need to store the encryption key ID in memory (at
> > least), but this is not easy to do right now without breaking binary
> > compatibility.
> >
> > > 7. Question -the current implementation does not use the throttling that
> > > is implemented in PDS. Users should set the throughput such as 5 MB per
> > > second, but not the timeout, packet size, or stream size.
> >
> > I've added a simple rate limiter for this.
> >
> > > 8. Question - why we add a lot of system properties?
> > >> Can you, please, list system properties that should be moved to the 
> > >> configuration?
> >
> > It's about the background re-encryption properties, for now, it is:
> > - re-encryption speed limit (in megabytes per second)
> > - threads count used for re-encryption
> > - count of pages in batch, processed under checkpoint lock
> > - flag to completely disable background re-encryption
> >
> > > 11. We should remember about complicated test scenarios with failover
> >
> > PR contains tests for re-encryption (and key rotation) on unstable
> > topology (with baseline change and without it). I'll expand them if I
> > missed some cases.
> >
> > > 13. Will re-encryption continue after the cluster is completely stopped?
> >
> > Yes, as I mentioned earlier, we save the re-encryption status in the
> > meta page of each re-encrypted partition and 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-07-31 Thread Pavel Pereslegin
Hello,

I'll expand the answer a bit about calculating CRC, the problem is not
that it is calculated twice, but that now for encrypted pages,
EncryptedFileIO checks physical integrity, and FilePageStore checks
the correctness of the encryption key, but from my point of view, it
should be vice versa - the lower (delegated) FileIO should check the
physical integrity and EncryptedFileIO should check the correctness of
the encryption key.

пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin :
>
> Hello,
>
> > 10. Question - CRC is read in two places encryptionFileIO and
> > filePageStore - what should we do with this?
>
> We need to calculate the CRC of encrypted data, because we may be
> using the wrong encryption key to decrypt data, in which case we will
> not understand if the physical integrity is violated or the wrong
> encryption key is used.
>
> > 9. Question - How do we optimize when we can check that this page is
> > already encrypted by parallel loading? Maybe we should do this in Phase 4?
>
> To do this, we need to store the encryption key ID in memory (at
> least), but this is not easy to do right now without breaking binary
> compatibility.
>
> > 7. Question -the current implementation does not use the throttling that
> > is implemented in PDS. Users should set the throughput such as 5 MB per
> > second, but not the timeout, packet size, or stream size.
>
> I've added a simple rate limiter for this.
>
> > 8. Question - why we add a lot of system properties?
> >> Can you, please, list system properties that should be moved to the 
> >> configuration?
>
> It's about the background re-encryption properties, for now, it is:
> - re-encryption speed limit (in megabytes per second)
> - threads count used for re-encryption
> - count of pages in batch, processed under checkpoint lock
> - flag to completely disable background re-encryption
>
> > 11. We should remember about complicated test scenarios with failover
>
> PR contains tests for re-encryption (and key rotation) on unstable
> topology (with baseline change and without it). I'll expand them if I
> missed some cases.
>
> > 13. Will re-encryption continue after the cluster is completely stopped?
>
> Yes, as I mentioned earlier, we save the re-encryption status in the
> meta page of each re-encrypted partition and trigger re-encryption on
> node startup if needed (more detailed description on the wiki).
>
> Thanks a lot for your comments, I am still working on PR and expanding
> wiki documentation. I'll let you know when it will be ready for the
> review.
>
> вт, 28 июл. 2020 г. в 19:14, Alexey Goncharuk :
> >
> > Hello Nikolay,
> >
> >
> > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > filePageStore - what should we do with this?
> > >
> > > filePageStore checks CRC of the encrypted page. This required to confirm
> > > the page not corrupted on the disk.
> > > encryptionFileIO checks CRC of the decrypted page(CRC itself stored in the
> > > encrypted data).
> > > This required to be sure the decrypted page contains correct data and not
> > > replaced with some malicious content.
> > >
> >
> > I still do not see why we need CRC twice, can you please elaborate on this
> > statement? If an attacker is able to replace the contents of an encrypted
> > page, it means that they have access to the encryption key. What will
> > prevent them from calculating the CRC of malicious content and then
> > encrypting it?


Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-07-31 Thread Pavel Pereslegin
Hello,

> 10. Question - CRC is read in two places encryptionFileIO and
> filePageStore - what should we do with this?

We need to calculate the CRC of encrypted data, because we may be
using the wrong encryption key to decrypt data, in which case we will
not understand if the physical integrity is violated or the wrong
encryption key is used.

> 9. Question - How do we optimize when we can check that this page is
> already encrypted by parallel loading? Maybe we should do this in Phase 4?

To do this, we need to store the encryption key ID in memory (at
least), but this is not easy to do right now without breaking binary
compatibility.

> 7. Question -the current implementation does not use the throttling that
> is implemented in PDS. Users should set the throughput such as 5 MB per
> second, but not the timeout, packet size, or stream size.

I've added a simple rate limiter for this.

> 8. Question - why we add a lot of system properties?
>> Can you, please, list system properties that should be moved to the 
>> configuration?

It's about the background re-encryption properties, for now, it is:
- re-encryption speed limit (in megabytes per second)
- threads count used for re-encryption
- count of pages in batch, processed under checkpoint lock
- flag to completely disable background re-encryption

> 11. We should remember about complicated test scenarios with failover

PR contains tests for re-encryption (and key rotation) on unstable
topology (with baseline change and without it). I'll expand them if I
missed some cases.

> 13. Will re-encryption continue after the cluster is completely stopped?

Yes, as I mentioned earlier, we save the re-encryption status in the
meta page of each re-encrypted partition and trigger re-encryption on
node startup if needed (more detailed description on the wiki).

Thanks a lot for your comments, I am still working on PR and expanding
wiki documentation. I'll let you know when it will be ready for the
review.

вт, 28 июл. 2020 г. в 19:14, Alexey Goncharuk :
>
> Hello Nikolay,
>
>
> > > 10. Question - CRC is read in two places encryptionFileIO and
> > filePageStore - what should we do with this?
> >
> > filePageStore checks CRC of the encrypted page. This required to confirm
> > the page not corrupted on the disk.
> > encryptionFileIO checks CRC of the decrypted page(CRC itself stored in the
> > encrypted data).
> > This required to be sure the decrypted page contains correct data and not
> > replaced with some malicious content.
> >
>
> I still do not see why we need CRC twice, can you please elaborate on this
> statement? If an attacker is able to replace the contents of an encrypted
> page, it means that they have access to the encryption key. What will
> prevent them from calculating the CRC of malicious content and then
> encrypting it?


Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-07-28 Thread Alexey Goncharuk
Hello Nikolay,


> > 10. Question - CRC is read in two places encryptionFileIO and
> filePageStore - what should we do with this?
>
> filePageStore checks CRC of the encrypted page. This required to confirm
> the page not corrupted on the disk.
> encryptionFileIO checks CRC of the decrypted page(CRC itself stored in the
> encrypted data).
> This required to be sure the decrypted page contains correct data and not
> replaced with some malicious content.
>

I still do not see why we need CRC twice, can you please elaborate on this
statement? If an attacker is able to replace the contents of an encrypted
page, it means that they have access to the encryption key. What will
prevent them from calculating the CRC of malicious content and then
encrypting it?


Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-07-24 Thread Nikolay Izhikov
Hello, Maksim.

Thanks for the summary.
From my point of view, we should focus on Phase3 implementation and then do the 
refactoring for some specific SPI implementation.

> 8. Question - why we add a lot of system properties?

Can you, please, list system properties that should be moved to the 
configuration?

> 10. Question - CRC is read in two places encryptionFileIO and filePageStore - 
> what should we do with this?

filePageStore checks CRC of the encrypted page. This required to confirm the 
page not corrupted on the disk.
encryptionFileIO checks CRC of the decrypted page(CRC itself stored in the 
encrypted data). 
This required to be sure the decrypted page contains correct data and not 
replaced with some malicious content.

Here is the list of items that are not related to Phase3 implementation.
Please, tell me what do you think:

>   2. We should try to run the existing test suites in encryption mode.

We did it during TDE.Phase1 testing.

>  3. SPI requires an additional method such as getKeyDigest
>  4. Recommendation - the encryption processor should be divided into external 
> subclasses
 
> 5. Recommendation - we should not use tuples and triples, because this is a 
> marker of a design problem.
> 6. Strict recommendation - please don't put context everywhere

Actually, this is a question of taste and obviously not related to the current 
discussion.

> 24 июля 2020 г., в 14:27, Maksim Stepachev  
> написал(а):
> 
> Hello everyone, yesterday we discussed the implementation of TDE over a
> conference call. I added a summary of this call here:
> 
>   1. The wiki documentation should be expanded. It should describe the
>   steps - how it works under the hood. What are the domain objects in the
>   implementation?
>   2. We should try to run the existing test suites in encryption mode.
>   Encryption should not affect any PDS or other tests.
>   3. SPI requires an additional method such as getKeyDigest, because the
>   current implementation of GridEncryptionManager#masterKeyDigest() looks
>   strange. We reset the master key to calculate the digest. This will not
>   work well if we want to use VOLT as a key provider implementation.
>   4. Recommendation - the encryption processor should be divided into
>   external subclasses, and we should use the OOP decomposition pattern for
>   it. Right now, this class has more than 2000 lines and does not support
>   SOLID. This is similar to inline unrelated logic with a single class.
>   5. Recommendation - we should not use tuples and triples, because this
>   is a marker of a design problem.
>   6. Strict recommendation - please don't put context everywhere. it
>   should only be used in the parent class. You can pass the necessary
>   dependencies through the constructor, as in the DI pattern.
>   7. Question -the current implementation does not use the throttling that
>   is implemented in PDS. Users should set the throughput such as 5 MB per
>   second, but not the timeout, packet size, or stream size.
>   8. Question - why we add a lot of system properties? Why we didn’t add a
>   configuration for it?
>   9. Question - How do we optimize when we can check that this page is
>   already encrypted by parallel loading? Maybe we should do this in Phase 4?
>   10. Question - CRC is read in two places encryptionFileIO and
>   filePageStore - what should we do with this?
>   11. We should remember about complicated test scenarios with failover
>   like node left when encryption started and joined after it finished. In the
>   process, the baseline changed node left before / after / in the middle of
>   this process. And etc.
>   12. How to use a sandbox to protect our cluster of master and user key
>   stealing via compute?
>   13. Will re-encryption continue after the cluster is completely stopped?
> 
> If I forgot some points, you can add them to the message.
> 
> 
> вт, 7 июл. 2020 г. в 17:40, Pavel Pereslegin :
> 
>> Hello, Maksim.
>> 
>> For implementation, I chose so-called "in place background
>> re-encryption" design.
>> 
>> The first step is to rotate the key for writing data, it only works on
>> the active cluster, at the moment..
>> The second step is re-encryption (to remove previous encryption key).
>> If node was restarted reencryption starts after metastorage becomes
>> ready for read/write. Each "re-encrypted" partition (including index)
>> has an attribute on the meta page that indicates whether background
>> re-encryption should be continued.
>> 
>> I updated the description in wiki [1].
>> Some more details in jira [2].
>> Draft PR [3].
>> 
>> [1]
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384
>> [2] https://issues.apache.org/jira/browse/IGNITE-12843
>> [3] https://github.com/apache/ignite/pull/7941
>> 
>> вт, 7 июл. 2020 г. в 13:49, Maksim Stepachev :
>>> 
>>> Hi!
>>> 
>>> Do you have any updates about this issue? What types of implementations
>>> have you chosen (in-place, offline, or in the 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-07-24 Thread Maksim Stepachev
Hello everyone, yesterday we discussed the implementation of TDE over a
conference call. I added a summary of this call here:

   1. The wiki documentation should be expanded. It should describe the
   steps - how it works under the hood. What are the domain objects in the
   implementation?
   2. We should try to run the existing test suites in encryption mode.
   Encryption should not affect any PDS or other tests.
   3. SPI requires an additional method such as getKeyDigest, because the
   current implementation of GridEncryptionManager#masterKeyDigest() looks
   strange. We reset the master key to calculate the digest. This will not
   work well if we want to use VOLT as a key provider implementation.
   4. Recommendation - the encryption processor should be divided into
   external subclasses, and we should use the OOP decomposition pattern for
   it. Right now, this class has more than 2000 lines and does not support
   SOLID. This is similar to inline unrelated logic with a single class.
   5. Recommendation - we should not use tuples and triples, because this
   is a marker of a design problem.
   6. Strict recommendation - please don't put context everywhere. it
   should only be used in the parent class. You can pass the necessary
   dependencies through the constructor, as in the DI pattern.
   7. Question -the current implementation does not use the throttling that
   is implemented in PDS. Users should set the throughput such as 5 MB per
   second, but not the timeout, packet size, or stream size.
   8. Question - why we add a lot of system properties? Why we didn’t add a
   configuration for it?
   9. Question - How do we optimize when we can check that this page is
   already encrypted by parallel loading? Maybe we should do this in Phase 4?
   10. Question - CRC is read in two places encryptionFileIO and
   filePageStore - what should we do with this?
   11. We should remember about complicated test scenarios with failover
   like node left when encryption started and joined after it finished. In the
   process, the baseline changed node left before / after / in the middle of
   this process. And etc.
   12. How to use a sandbox to protect our cluster of master and user key
   stealing via compute?
   13. Will re-encryption continue after the cluster is completely stopped?

If I forgot some points, you can add them to the message.


вт, 7 июл. 2020 г. в 17:40, Pavel Pereslegin :

> Hello, Maksim.
>
> For implementation, I chose so-called "in place background
> re-encryption" design.
>
> The first step is to rotate the key for writing data, it only works on
> the active cluster, at the moment..
> The second step is re-encryption (to remove previous encryption key).
> If node was restarted reencryption starts after metastorage becomes
> ready for read/write. Each "re-encrypted" partition (including index)
> has an attribute on the meta page that indicates whether background
> re-encryption should be continued.
>
> I updated the description in wiki [1].
> Some more details in jira [2].
> Draft PR [3].
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384
> [2] https://issues.apache.org/jira/browse/IGNITE-12843
> [3] https://github.com/apache/ignite/pull/7941
>
> вт, 7 июл. 2020 г. в 13:49, Maksim Stepachev :
> >
> > Hi!
> >
> > Do you have any updates about this issue? What types of implementations
> > have you chosen (in-place, offline, or in the background)? I know that we
> > want to add a partition defragmentation function, we can add a hole to
> > integrate the re-encryption scheme. Could you update your IEP with your
> > plans?
> >
> > пн, 25 мая 2020 г. в 12:50, Pavel Pereslegin :
> >
> > > Nikolay, Alexei,
> > >
> > > thanks for your suggestions.
> > >
> > > Offline re-encryption does not seem so simple, we need to read/replace
> > > the existing encryption keys on all nodes (therefore, we should be
> > > able to read/write metastore/WAL and exchange data between the
> > > baseline nodes). Re-encryption in maintenance mode (for example, in a
> > > stable read-only cluster) will be simple, but it still looks very
> > > inconvenient, at least because users will need to interrupt all
> > > operations.
> > >
> > > The main advantage of online "in place" re-encryption is that we'll
> > > support multiple keys for reading, and this procedure does not
> > > directly depend on background re-encryption.
> > >
> > > So, the first step is similar to rotating the master key when the new
> > > key was set for writing on all nodes - that’s it, the cache group key
> > > rotation is complete (this is what PCI DSS requires - encrypt new
> > > updates with new keys).
> > > The second step is to re-encrypt the existing data, As I said
> > > previously I thought about scanning all partition pages in some
> > > background mode (store progress on the metapage to continue after
> > > restart), but rebalance approach should also work here if I figure out
> > > how to automate 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-07-07 Thread Pavel Pereslegin
Hello, Maksim.

For implementation, I chose so-called "in place background
re-encryption" design.

The first step is to rotate the key for writing data, it only works on
the active cluster, at the moment..
The second step is re-encryption (to remove previous encryption key).
If node was restarted reencryption starts after metastorage becomes
ready for read/write. Each "re-encrypted" partition (including index)
has an attribute on the meta page that indicates whether background
re-encryption should be continued.

I updated the description in wiki [1].
Some more details in jira [2].
Draft PR [3].

[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384
[2] https://issues.apache.org/jira/browse/IGNITE-12843
[3] https://github.com/apache/ignite/pull/7941

вт, 7 июл. 2020 г. в 13:49, Maksim Stepachev :
>
> Hi!
>
> Do you have any updates about this issue? What types of implementations
> have you chosen (in-place, offline, or in the background)? I know that we
> want to add a partition defragmentation function, we can add a hole to
> integrate the re-encryption scheme. Could you update your IEP with your
> plans?
>
> пн, 25 мая 2020 г. в 12:50, Pavel Pereslegin :
>
> > Nikolay, Alexei,
> >
> > thanks for your suggestions.
> >
> > Offline re-encryption does not seem so simple, we need to read/replace
> > the existing encryption keys on all nodes (therefore, we should be
> > able to read/write metastore/WAL and exchange data between the
> > baseline nodes). Re-encryption in maintenance mode (for example, in a
> > stable read-only cluster) will be simple, but it still looks very
> > inconvenient, at least because users will need to interrupt all
> > operations.
> >
> > The main advantage of online "in place" re-encryption is that we'll
> > support multiple keys for reading, and this procedure does not
> > directly depend on background re-encryption.
> >
> > So, the first step is similar to rotating the master key when the new
> > key was set for writing on all nodes - that’s it, the cache group key
> > rotation is complete (this is what PCI DSS requires - encrypt new
> > updates with new keys).
> > The second step is to re-encrypt the existing data, As I said
> > previously I thought about scanning all partition pages in some
> > background mode (store progress on the metapage to continue after
> > restart), but rebalance approach should also work here if I figure out
> > how to automate this process.
> >
> > пн, 25 мая 2020 г. в 12:22, Alexei Scherbakov <
> > alexey.scherbak...@gmail.com>:
> > >
> > >
> > >
> > > пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov :
> > >>
> > >> > This willl takes us to the re-encryption using full rebalancing
> > >>
> > >> Rebalance will require 2x efforts for reencryption
> > >>
> > >> 1. Read and send data from supplier node.
> > >> 2. Reencrypt and write data on demander node.
> > >>
> > >> Instead of
> > >>
> > >> 1. Read, reencrypt and write data on «demander» node.
> > >
> > >
> > > Usually, reading and sending is not a bottleneck. And don't forget we
> > can run out of WAL history and fall back to full rebalancing with partition
> > eviction eliminating all efforts from offline re-encryption.
> > >
> > > On the other side, for a grid having many nodes one-by-one re-encryption
> > can take a long time.
> > > It should also be possible to re-encrypt all data as fast as possible
> > if, for example, if a load can be switched to another grid, where offline
> > encryption will come in handy.
> > >
> > > So, I suggest to implement offline re-encryption and online
> > re-encryption using rebalancing as a first step.
> > >
> > > Next step can be online in-place re-encryption. It's important to
> > measure business impact from it on online grid.
> > >
> > >>
> > >>
> > >>
> > >> > 25 мая 2020 г., в 11:46, Alexei Scherbakov <
> > alexey.scherbak...@gmail.com> написал(а):
> > >> >
> > >> > For me, the one big disadvantage for offline re-encryption is the
> > >> > possibility to run out of WAL history.
> > >> > If an re-encryption takes a long time we will get full rebalancing
> > with
> > >> > partition eviction.
> > >> > This willl takes us to the re-encryption using full rebalancing,
> > proposed
> > >> > by me earlier.
> > >> >
> > >> >
> > >> >
> > >> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov :
> > >> >
> > >> >>> And definitely this approach is much simplier to implement
> > >> >>
> > >> >> I agree.
> > >> >>
> > >> >> If we allow to made nodes offline for reencryption then we can
> > implement a
> > >> >> fully offline procedure:
> > >> >>
> > >> >> 1. Stop node.
> > >> >> 2. Execute some control.sh command that will reencrypt all data
> > without
> > >> >> starting node
> > >> >> 3. Start node.
> > >> >>
> > >> >> Pavel, can you, please, write it one more time - what disadvantages
> > in
> > >> >> offline procedure?
> > >> >>
> > >> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov <
> > alexey.scherbak...@gmail.com>
> > >> >> написал(а):
> > >> >>>
> > >> >>> 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-07-07 Thread Maksim Stepachev
Hi!

Do you have any updates about this issue? What types of implementations
have you chosen (in-place, offline, or in the background)? I know that we
want to add a partition defragmentation function, we can add a hole to
integrate the re-encryption scheme. Could you update your IEP with your
plans?

пн, 25 мая 2020 г. в 12:50, Pavel Pereslegin :

> Nikolay, Alexei,
>
> thanks for your suggestions.
>
> Offline re-encryption does not seem so simple, we need to read/replace
> the existing encryption keys on all nodes (therefore, we should be
> able to read/write metastore/WAL and exchange data between the
> baseline nodes). Re-encryption in maintenance mode (for example, in a
> stable read-only cluster) will be simple, but it still looks very
> inconvenient, at least because users will need to interrupt all
> operations.
>
> The main advantage of online "in place" re-encryption is that we'll
> support multiple keys for reading, and this procedure does not
> directly depend on background re-encryption.
>
> So, the first step is similar to rotating the master key when the new
> key was set for writing on all nodes - that’s it, the cache group key
> rotation is complete (this is what PCI DSS requires - encrypt new
> updates with new keys).
> The second step is to re-encrypt the existing data, As I said
> previously I thought about scanning all partition pages in some
> background mode (store progress on the metapage to continue after
> restart), but rebalance approach should also work here if I figure out
> how to automate this process.
>
> пн, 25 мая 2020 г. в 12:22, Alexei Scherbakov <
> alexey.scherbak...@gmail.com>:
> >
> >
> >
> > пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov :
> >>
> >> > This willl takes us to the re-encryption using full rebalancing
> >>
> >> Rebalance will require 2x efforts for reencryption
> >>
> >> 1. Read and send data from supplier node.
> >> 2. Reencrypt and write data on demander node.
> >>
> >> Instead of
> >>
> >> 1. Read, reencrypt and write data on «demander» node.
> >
> >
> > Usually, reading and sending is not a bottleneck. And don't forget we
> can run out of WAL history and fall back to full rebalancing with partition
> eviction eliminating all efforts from offline re-encryption.
> >
> > On the other side, for a grid having many nodes one-by-one re-encryption
> can take a long time.
> > It should also be possible to re-encrypt all data as fast as possible
> if, for example, if a load can be switched to another grid, where offline
> encryption will come in handy.
> >
> > So, I suggest to implement offline re-encryption and online
> re-encryption using rebalancing as a first step.
> >
> > Next step can be online in-place re-encryption. It's important to
> measure business impact from it on online grid.
> >
> >>
> >>
> >>
> >> > 25 мая 2020 г., в 11:46, Alexei Scherbakov <
> alexey.scherbak...@gmail.com> написал(а):
> >> >
> >> > For me, the one big disadvantage for offline re-encryption is the
> >> > possibility to run out of WAL history.
> >> > If an re-encryption takes a long time we will get full rebalancing
> with
> >> > partition eviction.
> >> > This willl takes us to the re-encryption using full rebalancing,
> proposed
> >> > by me earlier.
> >> >
> >> >
> >> >
> >> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov :
> >> >
> >> >>> And definitely this approach is much simplier to implement
> >> >>
> >> >> I agree.
> >> >>
> >> >> If we allow to made nodes offline for reencryption then we can
> implement a
> >> >> fully offline procedure:
> >> >>
> >> >> 1. Stop node.
> >> >> 2. Execute some control.sh command that will reencrypt all data
> without
> >> >> starting node
> >> >> 3. Start node.
> >> >>
> >> >> Pavel, can you, please, write it one more time - what disadvantages
> in
> >> >> offline procedure?
> >> >>
> >> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov <
> alexey.scherbak...@gmail.com>
> >> >> написал(а):
> >> >>>
> >> >>> And definitely this approach is much simplier to implement because
> all
> >> >>> corner cases are handled by rebalancing code.
> >> >>>
> >> >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
> >> >> alexey.scherbak...@gmail.com
> >>  :
> >> >>>
> >>  I mean: serving supply requests.
> >> 
> >>  пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
> >>  alexey.scherbak...@gmail.com>:
> >> 
> >> > Nikolay,
> >> >
> >> > Can you explain why such restriction is necessary ?
> >> > Most likely having a currently re-encrypting node serving only
> demand
> >> > requests will have least preformance impact on a grid.
> >> >
> >> > пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov  >:
> >> >
> >> >> Hello, Alexei.
> >> >>
> >> >> I think we want to implement this feature without nodes restart.
> >> >> In the ideal scenario all nodes will stay alive and respond to
> the
> >> >> user
> >> >> requests.
> >> >>
> >> >>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
> >> >> 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-25 Thread Pavel Pereslegin
Nikolay, Alexei,

thanks for your suggestions.

Offline re-encryption does not seem so simple, we need to read/replace
the existing encryption keys on all nodes (therefore, we should be
able to read/write metastore/WAL and exchange data between the
baseline nodes). Re-encryption in maintenance mode (for example, in a
stable read-only cluster) will be simple, but it still looks very
inconvenient, at least because users will need to interrupt all
operations.

The main advantage of online "in place" re-encryption is that we'll
support multiple keys for reading, and this procedure does not
directly depend on background re-encryption.

So, the first step is similar to rotating the master key when the new
key was set for writing on all nodes - that’s it, the cache group key
rotation is complete (this is what PCI DSS requires - encrypt new
updates with new keys).
The second step is to re-encrypt the existing data, As I said
previously I thought about scanning all partition pages in some
background mode (store progress on the metapage to continue after
restart), but rebalance approach should also work here if I figure out
how to automate this process.

пн, 25 мая 2020 г. в 12:22, Alexei Scherbakov :
>
>
>
> пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov :
>>
>> > This willl takes us to the re-encryption using full rebalancing
>>
>> Rebalance will require 2x efforts for reencryption
>>
>> 1. Read and send data from supplier node.
>> 2. Reencrypt and write data on demander node.
>>
>> Instead of
>>
>> 1. Read, reencrypt and write data on «demander» node.
>
>
> Usually, reading and sending is not a bottleneck. And don't forget we can run 
> out of WAL history and fall back to full rebalancing with partition eviction 
> eliminating all efforts from offline re-encryption.
>
> On the other side, for a grid having many nodes one-by-one re-encryption can 
> take a long time.
> It should also be possible to re-encrypt all data as fast as possible if, for 
> example, if a load can be switched to another grid, where offline encryption 
> will come in handy.
>
> So, I suggest to implement offline re-encryption and online re-encryption 
> using rebalancing as a first step.
>
> Next step can be online in-place re-encryption. It's important to measure 
> business impact from it on online grid.
>
>>
>>
>>
>> > 25 мая 2020 г., в 11:46, Alexei Scherbakov  
>> > написал(а):
>> >
>> > For me, the one big disadvantage for offline re-encryption is the
>> > possibility to run out of WAL history.
>> > If an re-encryption takes a long time we will get full rebalancing with
>> > partition eviction.
>> > This willl takes us to the re-encryption using full rebalancing, proposed
>> > by me earlier.
>> >
>> >
>> >
>> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov :
>> >
>> >>> And definitely this approach is much simplier to implement
>> >>
>> >> I agree.
>> >>
>> >> If we allow to made nodes offline for reencryption then we can implement a
>> >> fully offline procedure:
>> >>
>> >> 1. Stop node.
>> >> 2. Execute some control.sh command that will reencrypt all data without
>> >> starting node
>> >> 3. Start node.
>> >>
>> >> Pavel, can you, please, write it one more time - what disadvantages in
>> >> offline procedure?
>> >>
>> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov 
>> >> написал(а):
>> >>>
>> >>> And definitely this approach is much simplier to implement because all
>> >>> corner cases are handled by rebalancing code.
>> >>>
>> >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
>> >> alexey.scherbak...@gmail.com
>>  :
>> >>>
>>  I mean: serving supply requests.
>> 
>>  пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
>>  alexey.scherbak...@gmail.com>:
>> 
>> > Nikolay,
>> >
>> > Can you explain why such restriction is necessary ?
>> > Most likely having a currently re-encrypting node serving only demand
>> > requests will have least preformance impact on a grid.
>> >
>> > пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov :
>> >
>> >> Hello, Alexei.
>> >>
>> >> I think we want to implement this feature without nodes restart.
>> >> In the ideal scenario all nodes will stay alive and respond to the
>> >> user
>> >> requests.
>> >>
>> >>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
>> >> alexey.scherbak...@gmail.com> написал(а):
>> >>>
>> >>> Pavel Pereslegin,
>> >>>
>> >>> I see another opportunity.
>> >>> We can use rebalancing to re-encrypt node data with a new key.
>> >>> It's a trivial procedure for me: stop a node, clear database, change
>> >> a
>> >> key,
>> >>> start node and wait for rebalancing to complete.
>> >>> Data will be re-encrypted during rebalancing.
>> >>>
>> >>> Did I miss something ?
>> >>>
>> >>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov :
>> >>>
>>  Folks,
>> 
>>  Just keeping you informed: I and my colleagues are highly interested
>> >> in TDE
>>  

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-25 Thread Alexei Scherbakov
пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov :

> > This willl takes us to the re-encryption using full rebalancing
>
> Rebalance will require 2x efforts for reencryption
>
> 1. Read and send data from supplier node.
> 2. Reencrypt and write data on demander node.
>
> Instead of
>
> 1. Read, reencrypt and write data on «demander» node.
>

Usually, reading and sending is not a bottleneck. And don't forget we can
run out of WAL history and fall back to full rebalancing with partition
eviction eliminating all efforts from offline re-encryption.

On the other side, for a grid having many nodes one-by-one re-encryption
can take a long time.
It should also be possible to re-encrypt all data as fast as possible if,
for example, if a load can be switched to another grid, where offline
encryption will come in handy.

So, I suggest to implement offline re-encryption and online re-encryption
using rebalancing as a first step.

Next step can be online in-place re-encryption. It's important to measure
business impact from it on online grid.


>
>
> > 25 мая 2020 г., в 11:46, Alexei Scherbakov 
> написал(а):
> >
> > For me, the one big disadvantage for offline re-encryption is the
> > possibility to run out of WAL history.
> > If an re-encryption takes a long time we will get full rebalancing with
> > partition eviction.
> > This willl takes us to the re-encryption using full rebalancing, proposed
> > by me earlier.
> >
> >
> >
> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov :
> >
> >>> And definitely this approach is much simplier to implement
> >>
> >> I agree.
> >>
> >> If we allow to made nodes offline for reencryption then we can
> implement a
> >> fully offline procedure:
> >>
> >> 1. Stop node.
> >> 2. Execute some control.sh command that will reencrypt all data without
> >> starting node
> >> 3. Start node.
> >>
> >> Pavel, can you, please, write it one more time - what disadvantages in
> >> offline procedure?
> >>
> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov <
> alexey.scherbak...@gmail.com>
> >> написал(а):
> >>>
> >>> And definitely this approach is much simplier to implement because all
> >>> corner cases are handled by rebalancing code.
> >>>
> >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
> >> alexey.scherbak...@gmail.com
>  :
> >>>
>  I mean: serving supply requests.
> 
>  пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
>  alexey.scherbak...@gmail.com>:
> 
> > Nikolay,
> >
> > Can you explain why such restriction is necessary ?
> > Most likely having a currently re-encrypting node serving only demand
> > requests will have least preformance impact on a grid.
> >
> > пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov :
> >
> >> Hello, Alexei.
> >>
> >> I think we want to implement this feature without nodes restart.
> >> In the ideal scenario all nodes will stay alive and respond to the
> >> user
> >> requests.
> >>
> >>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
> >> alexey.scherbak...@gmail.com> написал(а):
> >>>
> >>> Pavel Pereslegin,
> >>>
> >>> I see another opportunity.
> >>> We can use rebalancing to re-encrypt node data with a new key.
> >>> It's a trivial procedure for me: stop a node, clear database,
> change
> >> a
> >> key,
> >>> start node and wait for rebalancing to complete.
> >>> Data will be re-encrypted during rebalancing.
> >>>
> >>> Did I miss something ?
> >>>
> >>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov :
> >>>
>  Folks,
> 
>  Just keeping you informed: I and my colleagues are highly
> interested
> >> in TDE
>  in general and keys rotations specifically, but we don't have
> enough
> >> time
>  so far.
>  We'll dive into this feature and participate in reviews next
> month.
> 
>  --
>  Best Regards,
>  Ivan Rakov
> 
>  On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <
> xxt...@gmail.com
> >>>
>  wrote:
> 
> > Hello, Alexey.
> >
> >> is the encryption key for the data the same on all nodes in the
>  cluster?
> > Yes, each encrypted cache group has its own encryption key, the
> key
> >> is
> > the same on all nodes.
> >
> >> Clearly, during the re-encryption there will exist pages
> >> encrypted with both new and old keys at the same time.
> > Yes, there will be pages encrypted with different keys at the
> same
> >> time.
> > Currently, we only store one key for one cache group. To rotate a
> >> key,
> > at a certain point in time it is necessary to support several
> keys
> >> (at
> > least for reading the WAL).
> > For the "in place" strategy, we'll store the encryption key
> >> identifier
> > on each encrypted page (we currently have some unused space on
> > encrypted 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-25 Thread Nikolay Izhikov
> This willl takes us to the re-encryption using full rebalancing

Rebalance will require 2x efforts for reencryption

1. Read and send data from supplier node.
2. Reencrypt and write data on demander node.

Instead of

1. Read, reencrypt and write data on «demander» node.


> 25 мая 2020 г., в 11:46, Alexei Scherbakov  
> написал(а):
> 
> For me, the one big disadvantage for offline re-encryption is the
> possibility to run out of WAL history.
> If an re-encryption takes a long time we will get full rebalancing with
> partition eviction.
> This willl takes us to the re-encryption using full rebalancing, proposed
> by me earlier.
> 
> 
> 
> пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov :
> 
>>> And definitely this approach is much simplier to implement
>> 
>> I agree.
>> 
>> If we allow to made nodes offline for reencryption then we can implement a
>> fully offline procedure:
>> 
>> 1. Stop node.
>> 2. Execute some control.sh command that will reencrypt all data without
>> starting node
>> 3. Start node.
>> 
>> Pavel, can you, please, write it one more time - what disadvantages in
>> offline procedure?
>> 
>>> 25 мая 2020 г., в 11:20, Alexei Scherbakov 
>> написал(а):
>>> 
>>> And definitely this approach is much simplier to implement because all
>>> corner cases are handled by rebalancing code.
>>> 
>>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
>> alexey.scherbak...@gmail.com
 :
>>> 
 I mean: serving supply requests.
 
 пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
 alexey.scherbak...@gmail.com>:
 
> Nikolay,
> 
> Can you explain why such restriction is necessary ?
> Most likely having a currently re-encrypting node serving only demand
> requests will have least preformance impact on a grid.
> 
> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov :
> 
>> Hello, Alexei.
>> 
>> I think we want to implement this feature without nodes restart.
>> In the ideal scenario all nodes will stay alive and respond to the
>> user
>> requests.
>> 
>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
>> alexey.scherbak...@gmail.com> написал(а):
>>> 
>>> Pavel Pereslegin,
>>> 
>>> I see another opportunity.
>>> We can use rebalancing to re-encrypt node data with a new key.
>>> It's a trivial procedure for me: stop a node, clear database, change
>> a
>> key,
>>> start node and wait for rebalancing to complete.
>>> Data will be re-encrypted during rebalancing.
>>> 
>>> Did I miss something ?
>>> 
>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov :
>>> 
 Folks,
 
 Just keeping you informed: I and my colleagues are highly interested
>> in TDE
 in general and keys rotations specifically, but we don't have enough
>> time
 so far.
 We'll dive into this feature and participate in reviews next month.
 
 --
 Best Regards,
 Ivan Rakov
 
 On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin >> 
 wrote:
 
> Hello, Alexey.
> 
>> is the encryption key for the data the same on all nodes in the
 cluster?
> Yes, each encrypted cache group has its own encryption key, the key
>> is
> the same on all nodes.
> 
>> Clearly, during the re-encryption there will exist pages
>> encrypted with both new and old keys at the same time.
> Yes, there will be pages encrypted with different keys at the same
>> time.
> Currently, we only store one key for one cache group. To rotate a
>> key,
> at a certain point in time it is necessary to support several keys
>> (at
> least for reading the WAL).
> For the "in place" strategy, we'll store the encryption key
>> identifier
> on each encrypted page (we currently have some unused space on
> encrypted page, so I don't expect any memory overhead here). Thus,
>> we
> will have several keys for reading and one key for writing. I
>> assume
> that the old key will be automatically deleted when a specific WAL
> segment is deleted (and re-encryption is finished).
> 
>> Will a node continue to re-encrypt the data after it restarts?
> Yes.
> 
>> If a node goes down during the re-encryption, but the rest of the
>> cluster finishes re-encryption, will we consider the procedure
 complete?
> I'm not sure, but it looks like the key rotation is complete when
>> we
> set the new key on all nodes so that the updates will be encrypted
> with the new key (as required by PCI DSS).
> Status of re-encryption can be obtained separately (locally or
>> cluster
> wide).
> 
> I forgot to mention that with “in place” re-encryption it will be
> impossible to quickly cancel re-encryption, because by 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-25 Thread Alexei Scherbakov
For me, the one big disadvantage for offline re-encryption is the
possibility to run out of WAL history.
If an re-encryption takes a long time we will get full rebalancing with
partition eviction.
This willl takes us to the re-encryption using full rebalancing, proposed
by me earlier.



пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov :

> > And definitely this approach is much simplier to implement
>
> I agree.
>
> If we allow to made nodes offline for reencryption then we can implement a
> fully offline procedure:
>
> 1. Stop node.
> 2. Execute some control.sh command that will reencrypt all data without
> starting node
> 3. Start node.
>
> Pavel, can you, please, write it one more time - what disadvantages in
> offline procedure?
>
> > 25 мая 2020 г., в 11:20, Alexei Scherbakov 
> написал(а):
> >
> > And definitely this approach is much simplier to implement because all
> > corner cases are handled by rebalancing code.
> >
> > пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
> alexey.scherbak...@gmail.com
> >> :
> >
> >> I mean: serving supply requests.
> >>
> >> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
> >> alexey.scherbak...@gmail.com>:
> >>
> >>> Nikolay,
> >>>
> >>> Can you explain why such restriction is necessary ?
> >>> Most likely having a currently re-encrypting node serving only demand
> >>> requests will have least preformance impact on a grid.
> >>>
> >>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov :
> >>>
>  Hello, Alexei.
> 
>  I think we want to implement this feature without nodes restart.
>  In the ideal scenario all nodes will stay alive and respond to the
> user
>  requests.
> 
> > 24 мая 2020 г., в 15:24, Alexei Scherbakov <
>  alexey.scherbak...@gmail.com> написал(а):
> >
> > Pavel Pereslegin,
> >
> > I see another opportunity.
> > We can use rebalancing to re-encrypt node data with a new key.
> > It's a trivial procedure for me: stop a node, clear database, change
> a
>  key,
> > start node and wait for rebalancing to complete.
> > Data will be re-encrypted during rebalancing.
> >
> > Did I miss something ?
> >
> > пт, 22 мая 2020 г. в 16:14, Ivan Rakov :
> >
> >> Folks,
> >>
> >> Just keeping you informed: I and my colleagues are highly interested
>  in TDE
> >> in general and keys rotations specifically, but we don't have enough
>  time
> >> so far.
> >> We'll dive into this feature and participate in reviews next month.
> >>
> >> --
> >> Best Regards,
> >> Ivan Rakov
> >>
> >> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin  >
> >> wrote:
> >>
> >>> Hello, Alexey.
> >>>
>  is the encryption key for the data the same on all nodes in the
> >> cluster?
> >>> Yes, each encrypted cache group has its own encryption key, the key
>  is
> >>> the same on all nodes.
> >>>
>  Clearly, during the re-encryption there will exist pages
>  encrypted with both new and old keys at the same time.
> >>> Yes, there will be pages encrypted with different keys at the same
>  time.
> >>> Currently, we only store one key for one cache group. To rotate a
>  key,
> >>> at a certain point in time it is necessary to support several keys
>  (at
> >>> least for reading the WAL).
> >>> For the "in place" strategy, we'll store the encryption key
>  identifier
> >>> on each encrypted page (we currently have some unused space on
> >>> encrypted page, so I don't expect any memory overhead here). Thus,
> we
> >>> will have several keys for reading and one key for writing. I
> assume
> >>> that the old key will be automatically deleted when a specific WAL
> >>> segment is deleted (and re-encryption is finished).
> >>>
>  Will a node continue to re-encrypt the data after it restarts?
> >>> Yes.
> >>>
>  If a node goes down during the re-encryption, but the rest of the
>  cluster finishes re-encryption, will we consider the procedure
> >> complete?
> >>> I'm not sure, but it looks like the key rotation is complete when
> we
> >>> set the new key on all nodes so that the updates will be encrypted
> >>> with the new key (as required by PCI DSS).
> >>> Status of re-encryption can be obtained separately (locally or
>  cluster
> >>> wide).
> >>>
> >>> I forgot to mention that with “in place” re-encryption it will be
> >>> impossible to quickly cancel re-encryption, because by canceling we
> >>> mean re-encryption with the old key.
> >>>
>  How do you see the whole key rotation procedure will work?
> >>> Initial design for re-encryption with "partition copying" is
>  described
> >>> here [1]. I'll prepare detailed design for "in place" re-encryption
>  if
> >>> we'll go this way. In short, send the new encryption key
>  cluster-wide,
> >>> each 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-25 Thread Nikolay Izhikov
> And definitely this approach is much simplier to implement

I agree.

If we allow to made nodes offline for reencryption then we can implement a 
fully offline procedure:

1. Stop node.
2. Execute some control.sh command that will reencrypt all data without 
starting node
3. Start node.

Pavel, can you, please, write it one more time - what disadvantages in offline 
procedure?

> 25 мая 2020 г., в 11:20, Alexei Scherbakov  
> написал(а):
> 
> And definitely this approach is much simplier to implement because all
> corner cases are handled by rebalancing code.
> 
> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov > :
> 
>> I mean: serving supply requests.
>> 
>> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
>> alexey.scherbak...@gmail.com>:
>> 
>>> Nikolay,
>>> 
>>> Can you explain why such restriction is necessary ?
>>> Most likely having a currently re-encrypting node serving only demand
>>> requests will have least preformance impact on a grid.
>>> 
>>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov :
>>> 
 Hello, Alexei.
 
 I think we want to implement this feature without nodes restart.
 In the ideal scenario all nodes will stay alive and respond to the user
 requests.
 
> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
 alexey.scherbak...@gmail.com> написал(а):
> 
> Pavel Pereslegin,
> 
> I see another opportunity.
> We can use rebalancing to re-encrypt node data with a new key.
> It's a trivial procedure for me: stop a node, clear database, change a
 key,
> start node and wait for rebalancing to complete.
> Data will be re-encrypted during rebalancing.
> 
> Did I miss something ?
> 
> пт, 22 мая 2020 г. в 16:14, Ivan Rakov :
> 
>> Folks,
>> 
>> Just keeping you informed: I and my colleagues are highly interested
 in TDE
>> in general and keys rotations specifically, but we don't have enough
 time
>> so far.
>> We'll dive into this feature and participate in reviews next month.
>> 
>> --
>> Best Regards,
>> Ivan Rakov
>> 
>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin 
>> wrote:
>> 
>>> Hello, Alexey.
>>> 
 is the encryption key for the data the same on all nodes in the
>> cluster?
>>> Yes, each encrypted cache group has its own encryption key, the key
 is
>>> the same on all nodes.
>>> 
 Clearly, during the re-encryption there will exist pages
 encrypted with both new and old keys at the same time.
>>> Yes, there will be pages encrypted with different keys at the same
 time.
>>> Currently, we only store one key for one cache group. To rotate a
 key,
>>> at a certain point in time it is necessary to support several keys
 (at
>>> least for reading the WAL).
>>> For the "in place" strategy, we'll store the encryption key
 identifier
>>> on each encrypted page (we currently have some unused space on
>>> encrypted page, so I don't expect any memory overhead here). Thus, we
>>> will have several keys for reading and one key for writing. I assume
>>> that the old key will be automatically deleted when a specific WAL
>>> segment is deleted (and re-encryption is finished).
>>> 
 Will a node continue to re-encrypt the data after it restarts?
>>> Yes.
>>> 
 If a node goes down during the re-encryption, but the rest of the
 cluster finishes re-encryption, will we consider the procedure
>> complete?
>>> I'm not sure, but it looks like the key rotation is complete when we
>>> set the new key on all nodes so that the updates will be encrypted
>>> with the new key (as required by PCI DSS).
>>> Status of re-encryption can be obtained separately (locally or
 cluster
>>> wide).
>>> 
>>> I forgot to mention that with “in place” re-encryption it will be
>>> impossible to quickly cancel re-encryption, because by canceling we
>>> mean re-encryption with the old key.
>>> 
 How do you see the whole key rotation procedure will work?
>>> Initial design for re-encryption with "partition copying" is
 described
>>> here [1]. I'll prepare detailed design for "in place" re-encryption
 if
>>> we'll go this way. In short, send the new encryption key
 cluster-wide,
>>> each node adds a new key and starts background re-encryption.
>>> 
>>> [1]
>>> 
>> 
 https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>>> .
>>> 
>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
 alexey.goncha...@gmail.com
>>> :
 
 Pavel, Anton,
 
 How do you see the whole key rotation procedure will work? Clearly,
>>> during
 the re-encryption there will exist pages encrypted with both new and
>> old
 keys at the same 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-25 Thread Nikolay Izhikov
> Can you explain why such restriction is necessary ?

Reencryption should have a minimum impact on the cluster.

> Most likely having a currently re-encrypting node serving only demand 
> requests will have least preformance impact on a grid.

Current design assumes that reencryption will started on all noes 
simultaneously.


Makes sense?

> 25 мая 2020 г., в 11:16, Alexei Scherbakov  
> написал(а):
> 
> I mean: serving supply requests.
> 
> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov > :
> 
>> Nikolay,
>> 
>> Can you explain why such restriction is necessary ?
>> Most likely having a currently re-encrypting node serving only demand
>> requests will have least preformance impact on a grid.
>> 
>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov :
>> 
>>> Hello, Alexei.
>>> 
>>> I think we want to implement this feature without nodes restart.
>>> In the ideal scenario all nodes will stay alive and respond to the user
>>> requests.
>>> 
 24 мая 2020 г., в 15:24, Alexei Scherbakov <
>>> alexey.scherbak...@gmail.com> написал(а):
 
 Pavel Pereslegin,
 
 I see another opportunity.
 We can use rebalancing to re-encrypt node data with a new key.
 It's a trivial procedure for me: stop a node, clear database, change a
>>> key,
 start node and wait for rebalancing to complete.
 Data will be re-encrypted during rebalancing.
 
 Did I miss something ?
 
 пт, 22 мая 2020 г. в 16:14, Ivan Rakov :
 
> Folks,
> 
> Just keeping you informed: I and my colleagues are highly interested
>>> in TDE
> in general and keys rotations specifically, but we don't have enough
>>> time
> so far.
> We'll dive into this feature and participate in reviews next month.
> 
> --
> Best Regards,
> Ivan Rakov
> 
> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin 
> wrote:
> 
>> Hello, Alexey.
>> 
>>> is the encryption key for the data the same on all nodes in the
> cluster?
>> Yes, each encrypted cache group has its own encryption key, the key is
>> the same on all nodes.
>> 
>>> Clearly, during the re-encryption there will exist pages
>>> encrypted with both new and old keys at the same time.
>> Yes, there will be pages encrypted with different keys at the same
>>> time.
>> Currently, we only store one key for one cache group. To rotate a key,
>> at a certain point in time it is necessary to support several keys (at
>> least for reading the WAL).
>> For the "in place" strategy, we'll store the encryption key identifier
>> on each encrypted page (we currently have some unused space on
>> encrypted page, so I don't expect any memory overhead here). Thus, we
>> will have several keys for reading and one key for writing. I assume
>> that the old key will be automatically deleted when a specific WAL
>> segment is deleted (and re-encryption is finished).
>> 
>>> Will a node continue to re-encrypt the data after it restarts?
>> Yes.
>> 
>>> If a node goes down during the re-encryption, but the rest of the
>>> cluster finishes re-encryption, will we consider the procedure
> complete?
>> I'm not sure, but it looks like the key rotation is complete when we
>> set the new key on all nodes so that the updates will be encrypted
>> with the new key (as required by PCI DSS).
>> Status of re-encryption can be obtained separately (locally or cluster
>> wide).
>> 
>> I forgot to mention that with “in place” re-encryption it will be
>> impossible to quickly cancel re-encryption, because by canceling we
>> mean re-encryption with the old key.
>> 
>>> How do you see the whole key rotation procedure will work?
>> Initial design for re-encryption with "partition copying" is described
>> here [1]. I'll prepare detailed design for "in place" re-encryption if
>> we'll go this way. In short, send the new encryption key cluster-wide,
>> each node adds a new key and starts background re-encryption.
>> 
>> [1]
>> 
> 
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>> .
>> 
>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
>>> alexey.goncha...@gmail.com
>> :
>>> 
>>> Pavel, Anton,
>>> 
>>> How do you see the whole key rotation procedure will work? Clearly,
>> during
>>> the re-encryption there will exist pages encrypted with both new and
> old
>>> keys at the same time. Will a node continue to re-encrypt the data
> after
>> it
>>> restarts? If a node goes down during the re-encryption, but the rest
>>> of
>> the
>>> cluster finishes re-encryption, will we consider the procedure
> complete?
>> By
>>> the way, is the encryption key for the data the same on all nodes in
> the
>>> cluster?
>>> 
>>> чт, 14 мая 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-25 Thread Alexei Scherbakov
And definitely this approach is much simplier to implement because all
corner cases are handled by rebalancing code.

пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov :

> I mean: serving supply requests.
>
> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
> alexey.scherbak...@gmail.com>:
>
>> Nikolay,
>>
>> Can you explain why such restriction is necessary ?
>> Most likely having a currently re-encrypting node serving only demand
>> requests will have least preformance impact on a grid.
>>
>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov :
>>
>>> Hello, Alexei.
>>>
>>> I think we want to implement this feature without nodes restart.
>>> In the ideal scenario all nodes will stay alive and respond to the user
>>> requests.
>>>
>>> > 24 мая 2020 г., в 15:24, Alexei Scherbakov <
>>> alexey.scherbak...@gmail.com> написал(а):
>>> >
>>> > Pavel Pereslegin,
>>> >
>>> > I see another opportunity.
>>> > We can use rebalancing to re-encrypt node data with a new key.
>>> > It's a trivial procedure for me: stop a node, clear database, change a
>>> key,
>>> > start node and wait for rebalancing to complete.
>>> > Data will be re-encrypted during rebalancing.
>>> >
>>> > Did I miss something ?
>>> >
>>> > пт, 22 мая 2020 г. в 16:14, Ivan Rakov :
>>> >
>>> >> Folks,
>>> >>
>>> >> Just keeping you informed: I and my colleagues are highly interested
>>> in TDE
>>> >> in general and keys rotations specifically, but we don't have enough
>>> time
>>> >> so far.
>>> >> We'll dive into this feature and participate in reviews next month.
>>> >>
>>> >> --
>>> >> Best Regards,
>>> >> Ivan Rakov
>>> >>
>>> >> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin 
>>> >> wrote:
>>> >>
>>> >>> Hello, Alexey.
>>> >>>
>>>  is the encryption key for the data the same on all nodes in the
>>> >> cluster?
>>> >>> Yes, each encrypted cache group has its own encryption key, the key
>>> is
>>> >>> the same on all nodes.
>>> >>>
>>>  Clearly, during the re-encryption there will exist pages
>>>  encrypted with both new and old keys at the same time.
>>> >>> Yes, there will be pages encrypted with different keys at the same
>>> time.
>>> >>> Currently, we only store one key for one cache group. To rotate a
>>> key,
>>> >>> at a certain point in time it is necessary to support several keys
>>> (at
>>> >>> least for reading the WAL).
>>> >>> For the "in place" strategy, we'll store the encryption key
>>> identifier
>>> >>> on each encrypted page (we currently have some unused space on
>>> >>> encrypted page, so I don't expect any memory overhead here). Thus, we
>>> >>> will have several keys for reading and one key for writing. I assume
>>> >>> that the old key will be automatically deleted when a specific WAL
>>> >>> segment is deleted (and re-encryption is finished).
>>> >>>
>>>  Will a node continue to re-encrypt the data after it restarts?
>>> >>> Yes.
>>> >>>
>>>  If a node goes down during the re-encryption, but the rest of the
>>>  cluster finishes re-encryption, will we consider the procedure
>>> >> complete?
>>> >>> I'm not sure, but it looks like the key rotation is complete when we
>>> >>> set the new key on all nodes so that the updates will be encrypted
>>> >>> with the new key (as required by PCI DSS).
>>> >>> Status of re-encryption can be obtained separately (locally or
>>> cluster
>>> >>> wide).
>>> >>>
>>> >>> I forgot to mention that with “in place” re-encryption it will be
>>> >>> impossible to quickly cancel re-encryption, because by canceling we
>>> >>> mean re-encryption with the old key.
>>> >>>
>>>  How do you see the whole key rotation procedure will work?
>>> >>> Initial design for re-encryption with "partition copying" is
>>> described
>>> >>> here [1]. I'll prepare detailed design for "in place" re-encryption
>>> if
>>> >>> we'll go this way. In short, send the new encryption key
>>> cluster-wide,
>>> >>> each node adds a new key and starts background re-encryption.
>>> >>>
>>> >>> [1]
>>> >>>
>>> >>
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>>> >>> .
>>> >>>
>>> >>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
>>> alexey.goncha...@gmail.com
>>> >>> :
>>> 
>>>  Pavel, Anton,
>>> 
>>>  How do you see the whole key rotation procedure will work? Clearly,
>>> >>> during
>>>  the re-encryption there will exist pages encrypted with both new and
>>> >> old
>>>  keys at the same time. Will a node continue to re-encrypt the data
>>> >> after
>>> >>> it
>>>  restarts? If a node goes down during the re-encryption, but the
>>> rest of
>>> >>> the
>>>  cluster finishes re-encryption, will we consider the procedure
>>> >> complete?
>>> >>> By
>>>  the way, is the encryption key for the data the same on all nodes in
>>> >> the
>>>  cluster?
>>> 
>>>  чт, 14 мая 2020 г. в 11:30, Anton Vinogradov :
>>> 
>>> > +1 to "In place re-encryption".
>>> >
>>> > - It 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-25 Thread Alexei Scherbakov
I mean: serving supply requests.

пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov :

> Nikolay,
>
> Can you explain why such restriction is necessary ?
> Most likely having a currently re-encrypting node serving only demand
> requests will have least preformance impact on a grid.
>
> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov :
>
>> Hello, Alexei.
>>
>> I think we want to implement this feature without nodes restart.
>> In the ideal scenario all nodes will stay alive and respond to the user
>> requests.
>>
>> > 24 мая 2020 г., в 15:24, Alexei Scherbakov <
>> alexey.scherbak...@gmail.com> написал(а):
>> >
>> > Pavel Pereslegin,
>> >
>> > I see another opportunity.
>> > We can use rebalancing to re-encrypt node data with a new key.
>> > It's a trivial procedure for me: stop a node, clear database, change a
>> key,
>> > start node and wait for rebalancing to complete.
>> > Data will be re-encrypted during rebalancing.
>> >
>> > Did I miss something ?
>> >
>> > пт, 22 мая 2020 г. в 16:14, Ivan Rakov :
>> >
>> >> Folks,
>> >>
>> >> Just keeping you informed: I and my colleagues are highly interested
>> in TDE
>> >> in general and keys rotations specifically, but we don't have enough
>> time
>> >> so far.
>> >> We'll dive into this feature and participate in reviews next month.
>> >>
>> >> --
>> >> Best Regards,
>> >> Ivan Rakov
>> >>
>> >> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin 
>> >> wrote:
>> >>
>> >>> Hello, Alexey.
>> >>>
>>  is the encryption key for the data the same on all nodes in the
>> >> cluster?
>> >>> Yes, each encrypted cache group has its own encryption key, the key is
>> >>> the same on all nodes.
>> >>>
>>  Clearly, during the re-encryption there will exist pages
>>  encrypted with both new and old keys at the same time.
>> >>> Yes, there will be pages encrypted with different keys at the same
>> time.
>> >>> Currently, we only store one key for one cache group. To rotate a key,
>> >>> at a certain point in time it is necessary to support several keys (at
>> >>> least for reading the WAL).
>> >>> For the "in place" strategy, we'll store the encryption key identifier
>> >>> on each encrypted page (we currently have some unused space on
>> >>> encrypted page, so I don't expect any memory overhead here). Thus, we
>> >>> will have several keys for reading and one key for writing. I assume
>> >>> that the old key will be automatically deleted when a specific WAL
>> >>> segment is deleted (and re-encryption is finished).
>> >>>
>>  Will a node continue to re-encrypt the data after it restarts?
>> >>> Yes.
>> >>>
>>  If a node goes down during the re-encryption, but the rest of the
>>  cluster finishes re-encryption, will we consider the procedure
>> >> complete?
>> >>> I'm not sure, but it looks like the key rotation is complete when we
>> >>> set the new key on all nodes so that the updates will be encrypted
>> >>> with the new key (as required by PCI DSS).
>> >>> Status of re-encryption can be obtained separately (locally or cluster
>> >>> wide).
>> >>>
>> >>> I forgot to mention that with “in place” re-encryption it will be
>> >>> impossible to quickly cancel re-encryption, because by canceling we
>> >>> mean re-encryption with the old key.
>> >>>
>>  How do you see the whole key rotation procedure will work?
>> >>> Initial design for re-encryption with "partition copying" is described
>> >>> here [1]. I'll prepare detailed design for "in place" re-encryption if
>> >>> we'll go this way. In short, send the new encryption key cluster-wide,
>> >>> each node adds a new key and starts background re-encryption.
>> >>>
>> >>> [1]
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>> >>> .
>> >>>
>> >>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
>> alexey.goncha...@gmail.com
>> >>> :
>> 
>>  Pavel, Anton,
>> 
>>  How do you see the whole key rotation procedure will work? Clearly,
>> >>> during
>>  the re-encryption there will exist pages encrypted with both new and
>> >> old
>>  keys at the same time. Will a node continue to re-encrypt the data
>> >> after
>> >>> it
>>  restarts? If a node goes down during the re-encryption, but the rest
>> of
>> >>> the
>>  cluster finishes re-encryption, will we consider the procedure
>> >> complete?
>> >>> By
>>  the way, is the encryption key for the data the same on all nodes in
>> >> the
>>  cluster?
>> 
>>  чт, 14 мая 2020 г. в 11:30, Anton Vinogradov :
>> 
>> > +1 to "In place re-encryption".
>> >
>> > - It has a simple design.
>> > - Clusters under load may require just load to re-encrypt the data.
>> > (Friendly to load).
>> > - Easy to throttle.
>> > - Easy to continue.
>> > - Design compatible with the multi-key architecture.
>> > - It can be optimized to use own WAL buffer and to re-encrypt pages
>> >>> without
>> > restoring them 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-25 Thread Alexei Scherbakov
Nikolay,

Can you explain why such restriction is necessary ?
Most likely having a currently re-encrypting node serving only demand
requests will have least preformance impact on a grid.

пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov :

> Hello, Alexei.
>
> I think we want to implement this feature without nodes restart.
> In the ideal scenario all nodes will stay alive and respond to the user
> requests.
>
> > 24 мая 2020 г., в 15:24, Alexei Scherbakov 
> написал(а):
> >
> > Pavel Pereslegin,
> >
> > I see another opportunity.
> > We can use rebalancing to re-encrypt node data with a new key.
> > It's a trivial procedure for me: stop a node, clear database, change a
> key,
> > start node and wait for rebalancing to complete.
> > Data will be re-encrypted during rebalancing.
> >
> > Did I miss something ?
> >
> > пт, 22 мая 2020 г. в 16:14, Ivan Rakov :
> >
> >> Folks,
> >>
> >> Just keeping you informed: I and my colleagues are highly interested in
> TDE
> >> in general and keys rotations specifically, but we don't have enough
> time
> >> so far.
> >> We'll dive into this feature and participate in reviews next month.
> >>
> >> --
> >> Best Regards,
> >> Ivan Rakov
> >>
> >> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin 
> >> wrote:
> >>
> >>> Hello, Alexey.
> >>>
>  is the encryption key for the data the same on all nodes in the
> >> cluster?
> >>> Yes, each encrypted cache group has its own encryption key, the key is
> >>> the same on all nodes.
> >>>
>  Clearly, during the re-encryption there will exist pages
>  encrypted with both new and old keys at the same time.
> >>> Yes, there will be pages encrypted with different keys at the same
> time.
> >>> Currently, we only store one key for one cache group. To rotate a key,
> >>> at a certain point in time it is necessary to support several keys (at
> >>> least for reading the WAL).
> >>> For the "in place" strategy, we'll store the encryption key identifier
> >>> on each encrypted page (we currently have some unused space on
> >>> encrypted page, so I don't expect any memory overhead here). Thus, we
> >>> will have several keys for reading and one key for writing. I assume
> >>> that the old key will be automatically deleted when a specific WAL
> >>> segment is deleted (and re-encryption is finished).
> >>>
>  Will a node continue to re-encrypt the data after it restarts?
> >>> Yes.
> >>>
>  If a node goes down during the re-encryption, but the rest of the
>  cluster finishes re-encryption, will we consider the procedure
> >> complete?
> >>> I'm not sure, but it looks like the key rotation is complete when we
> >>> set the new key on all nodes so that the updates will be encrypted
> >>> with the new key (as required by PCI DSS).
> >>> Status of re-encryption can be obtained separately (locally or cluster
> >>> wide).
> >>>
> >>> I forgot to mention that with “in place” re-encryption it will be
> >>> impossible to quickly cancel re-encryption, because by canceling we
> >>> mean re-encryption with the old key.
> >>>
>  How do you see the whole key rotation procedure will work?
> >>> Initial design for re-encryption with "partition copying" is described
> >>> here [1]. I'll prepare detailed design for "in place" re-encryption if
> >>> we'll go this way. In short, send the new encryption key cluster-wide,
> >>> each node adds a new key and starts background re-encryption.
> >>>
> >>> [1]
> >>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> >>> .
> >>>
> >>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
> alexey.goncha...@gmail.com
> >>> :
> 
>  Pavel, Anton,
> 
>  How do you see the whole key rotation procedure will work? Clearly,
> >>> during
>  the re-encryption there will exist pages encrypted with both new and
> >> old
>  keys at the same time. Will a node continue to re-encrypt the data
> >> after
> >>> it
>  restarts? If a node goes down during the re-encryption, but the rest
> of
> >>> the
>  cluster finishes re-encryption, will we consider the procedure
> >> complete?
> >>> By
>  the way, is the encryption key for the data the same on all nodes in
> >> the
>  cluster?
> 
>  чт, 14 мая 2020 г. в 11:30, Anton Vinogradov :
> 
> > +1 to "In place re-encryption".
> >
> > - It has a simple design.
> > - Clusters under load may require just load to re-encrypt the data.
> > (Friendly to load).
> > - Easy to throttle.
> > - Easy to continue.
> > - Design compatible with the multi-key architecture.
> > - It can be optimized to use own WAL buffer and to re-encrypt pages
> >>> without
> > restoring them to on-heap.
> >
> > On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin 
> >>> wrote:
> >
> >> Hello Igniters.
> >>
> >> Recently, master key rotation for Apache Ignite Transparent Data
> >> Encryption was implemented [1], 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-25 Thread Nikolay Izhikov
Hello, Alexei.

I think we want to implement this feature without nodes restart.
In the ideal scenario all nodes will stay alive and respond to the user 
requests.

> 24 мая 2020 г., в 15:24, Alexei Scherbakov  
> написал(а):
> 
> Pavel Pereslegin,
> 
> I see another opportunity.
> We can use rebalancing to re-encrypt node data with a new key.
> It's a trivial procedure for me: stop a node, clear database, change a key,
> start node and wait for rebalancing to complete.
> Data will be re-encrypted during rebalancing.
> 
> Did I miss something ?
> 
> пт, 22 мая 2020 г. в 16:14, Ivan Rakov :
> 
>> Folks,
>> 
>> Just keeping you informed: I and my colleagues are highly interested in TDE
>> in general and keys rotations specifically, but we don't have enough time
>> so far.
>> We'll dive into this feature and participate in reviews next month.
>> 
>> --
>> Best Regards,
>> Ivan Rakov
>> 
>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin 
>> wrote:
>> 
>>> Hello, Alexey.
>>> 
 is the encryption key for the data the same on all nodes in the
>> cluster?
>>> Yes, each encrypted cache group has its own encryption key, the key is
>>> the same on all nodes.
>>> 
 Clearly, during the re-encryption there will exist pages
 encrypted with both new and old keys at the same time.
>>> Yes, there will be pages encrypted with different keys at the same time.
>>> Currently, we only store one key for one cache group. To rotate a key,
>>> at a certain point in time it is necessary to support several keys (at
>>> least for reading the WAL).
>>> For the "in place" strategy, we'll store the encryption key identifier
>>> on each encrypted page (we currently have some unused space on
>>> encrypted page, so I don't expect any memory overhead here). Thus, we
>>> will have several keys for reading and one key for writing. I assume
>>> that the old key will be automatically deleted when a specific WAL
>>> segment is deleted (and re-encryption is finished).
>>> 
 Will a node continue to re-encrypt the data after it restarts?
>>> Yes.
>>> 
 If a node goes down during the re-encryption, but the rest of the
 cluster finishes re-encryption, will we consider the procedure
>> complete?
>>> I'm not sure, but it looks like the key rotation is complete when we
>>> set the new key on all nodes so that the updates will be encrypted
>>> with the new key (as required by PCI DSS).
>>> Status of re-encryption can be obtained separately (locally or cluster
>>> wide).
>>> 
>>> I forgot to mention that with “in place” re-encryption it will be
>>> impossible to quickly cancel re-encryption, because by canceling we
>>> mean re-encryption with the old key.
>>> 
 How do you see the whole key rotation procedure will work?
>>> Initial design for re-encryption with "partition copying" is described
>>> here [1]. I'll prepare detailed design for "in place" re-encryption if
>>> we'll go this way. In short, send the new encryption key cluster-wide,
>>> each node adds a new key and starts background re-encryption.
>>> 
>>> [1]
>>> 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>>> .
>>> 
>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk >> :
 
 Pavel, Anton,
 
 How do you see the whole key rotation procedure will work? Clearly,
>>> during
 the re-encryption there will exist pages encrypted with both new and
>> old
 keys at the same time. Will a node continue to re-encrypt the data
>> after
>>> it
 restarts? If a node goes down during the re-encryption, but the rest of
>>> the
 cluster finishes re-encryption, will we consider the procedure
>> complete?
>>> By
 the way, is the encryption key for the data the same on all nodes in
>> the
 cluster?
 
 чт, 14 мая 2020 г. в 11:30, Anton Vinogradov :
 
> +1 to "In place re-encryption".
> 
> - It has a simple design.
> - Clusters under load may require just load to re-encrypt the data.
> (Friendly to load).
> - Easy to throttle.
> - Easy to continue.
> - Design compatible with the multi-key architecture.
> - It can be optimized to use own WAL buffer and to re-encrypt pages
>>> without
> restoring them to on-heap.
> 
> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin 
>>> wrote:
> 
>> Hello Igniters.
>> 
>> Recently, master key rotation for Apache Ignite Transparent Data
>> Encryption was implemented [1], but some security standards (PCI
>> DSS
>> at least) require rotation of all encryption keys [2]. Currently,
>> encryption occurs when reading/writing pages to disk, cache
>>> encryption
>> keys are stored in metastore.
>> 
>> I'm going to contribute cache encryption key rotation and want to
>> consult what is the best way to re-encrypting existing data, I see
>>> two
>> different strategies.
>> 
>> 1. In place re-encryption:
>> Using the old key, 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-24 Thread Alexei Scherbakov
Pavel Pereslegin,

I see another opportunity.
We can use rebalancing to re-encrypt node data with a new key.
It's a trivial procedure for me: stop a node, clear database, change a key,
start node and wait for rebalancing to complete.
Data will be re-encrypted during rebalancing.

Did I miss something ?

пт, 22 мая 2020 г. в 16:14, Ivan Rakov :

> Folks,
>
> Just keeping you informed: I and my colleagues are highly interested in TDE
> in general and keys rotations specifically, but we don't have enough time
> so far.
> We'll dive into this feature and participate in reviews next month.
>
> --
> Best Regards,
> Ivan Rakov
>
> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin 
> wrote:
>
> > Hello, Alexey.
> >
> > > is the encryption key for the data the same on all nodes in the
> cluster?
> > Yes, each encrypted cache group has its own encryption key, the key is
> > the same on all nodes.
> >
> > > Clearly, during the re-encryption there will exist pages
> > > encrypted with both new and old keys at the same time.
> > Yes, there will be pages encrypted with different keys at the same time.
> > Currently, we only store one key for one cache group. To rotate a key,
> > at a certain point in time it is necessary to support several keys (at
> > least for reading the WAL).
> > For the "in place" strategy, we'll store the encryption key identifier
> > on each encrypted page (we currently have some unused space on
> > encrypted page, so I don't expect any memory overhead here). Thus, we
> > will have several keys for reading and one key for writing. I assume
> > that the old key will be automatically deleted when a specific WAL
> > segment is deleted (and re-encryption is finished).
> >
> > > Will a node continue to re-encrypt the data after it restarts?
> > Yes.
> >
> > > If a node goes down during the re-encryption, but the rest of the
> > > cluster finishes re-encryption, will we consider the procedure
> complete?
> > I'm not sure, but it looks like the key rotation is complete when we
> > set the new key on all nodes so that the updates will be encrypted
> > with the new key (as required by PCI DSS).
> > Status of re-encryption can be obtained separately (locally or cluster
> > wide).
> >
> > I forgot to mention that with “in place” re-encryption it will be
> > impossible to quickly cancel re-encryption, because by canceling we
> > mean re-encryption with the old key.
> >
> > > How do you see the whole key rotation procedure will work?
> > Initial design for re-encryption with "partition copying" is described
> > here [1]. I'll prepare detailed design for "in place" re-encryption if
> > we'll go this way. In short, send the new encryption key cluster-wide,
> > each node adds a new key and starts background re-encryption.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> > .
> >
> > вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk  >:
> > >
> > > Pavel, Anton,
> > >
> > > How do you see the whole key rotation procedure will work? Clearly,
> > during
> > > the re-encryption there will exist pages encrypted with both new and
> old
> > > keys at the same time. Will a node continue to re-encrypt the data
> after
> > it
> > > restarts? If a node goes down during the re-encryption, but the rest of
> > the
> > > cluster finishes re-encryption, will we consider the procedure
> complete?
> > By
> > > the way, is the encryption key for the data the same on all nodes in
> the
> > > cluster?
> > >
> > > чт, 14 мая 2020 г. в 11:30, Anton Vinogradov :
> > >
> > > > +1 to "In place re-encryption".
> > > >
> > > > - It has a simple design.
> > > > - Clusters under load may require just load to re-encrypt the data.
> > > > (Friendly to load).
> > > > - Easy to throttle.
> > > > - Easy to continue.
> > > > - Design compatible with the multi-key architecture.
> > > > - It can be optimized to use own WAL buffer and to re-encrypt pages
> > without
> > > > restoring them to on-heap.
> > > >
> > > > On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin 
> > wrote:
> > > >
> > > > > Hello Igniters.
> > > > >
> > > > > Recently, master key rotation for Apache Ignite Transparent Data
> > > > > Encryption was implemented [1], but some security standards (PCI
> DSS
> > > > > at least) require rotation of all encryption keys [2]. Currently,
> > > > > encryption occurs when reading/writing pages to disk, cache
> > encryption
> > > > > keys are stored in metastore.
> > > > >
> > > > > I'm going to contribute cache encryption key rotation and want to
> > > > > consult what is the best way to re-encrypting existing data, I see
> > two
> > > > > different strategies.
> > > > >
> > > > > 1. In place re-encryption:
> > > > > Using the old key, sequentially read all the pages from the
> > datastore,
> > > > > mark as dirty and log them into the WAL. After checkpoint pages
> will
> > > > > be stored to disk encrypted with the new key (as usual, along with
> 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-22 Thread Ivan Rakov
Folks,

Just keeping you informed: I and my colleagues are highly interested in TDE
in general and keys rotations specifically, but we don't have enough time
so far.
We'll dive into this feature and participate in reviews next month.

--
Best Regards,
Ivan Rakov

On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin  wrote:

> Hello, Alexey.
>
> > is the encryption key for the data the same on all nodes in the cluster?
> Yes, each encrypted cache group has its own encryption key, the key is
> the same on all nodes.
>
> > Clearly, during the re-encryption there will exist pages
> > encrypted with both new and old keys at the same time.
> Yes, there will be pages encrypted with different keys at the same time.
> Currently, we only store one key for one cache group. To rotate a key,
> at a certain point in time it is necessary to support several keys (at
> least for reading the WAL).
> For the "in place" strategy, we'll store the encryption key identifier
> on each encrypted page (we currently have some unused space on
> encrypted page, so I don't expect any memory overhead here). Thus, we
> will have several keys for reading and one key for writing. I assume
> that the old key will be automatically deleted when a specific WAL
> segment is deleted (and re-encryption is finished).
>
> > Will a node continue to re-encrypt the data after it restarts?
> Yes.
>
> > If a node goes down during the re-encryption, but the rest of the
> > cluster finishes re-encryption, will we consider the procedure complete?
> I'm not sure, but it looks like the key rotation is complete when we
> set the new key on all nodes so that the updates will be encrypted
> with the new key (as required by PCI DSS).
> Status of re-encryption can be obtained separately (locally or cluster
> wide).
>
> I forgot to mention that with “in place” re-encryption it will be
> impossible to quickly cancel re-encryption, because by canceling we
> mean re-encryption with the old key.
>
> > How do you see the whole key rotation procedure will work?
> Initial design for re-encryption with "partition copying" is described
> here [1]. I'll prepare detailed design for "in place" re-encryption if
> we'll go this way. In short, send the new encryption key cluster-wide,
> each node adds a new key and starts background re-encryption.
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> .
>
> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk :
> >
> > Pavel, Anton,
> >
> > How do you see the whole key rotation procedure will work? Clearly,
> during
> > the re-encryption there will exist pages encrypted with both new and old
> > keys at the same time. Will a node continue to re-encrypt the data after
> it
> > restarts? If a node goes down during the re-encryption, but the rest of
> the
> > cluster finishes re-encryption, will we consider the procedure complete?
> By
> > the way, is the encryption key for the data the same on all nodes in the
> > cluster?
> >
> > чт, 14 мая 2020 г. в 11:30, Anton Vinogradov :
> >
> > > +1 to "In place re-encryption".
> > >
> > > - It has a simple design.
> > > - Clusters under load may require just load to re-encrypt the data.
> > > (Friendly to load).
> > > - Easy to throttle.
> > > - Easy to continue.
> > > - Design compatible with the multi-key architecture.
> > > - It can be optimized to use own WAL buffer and to re-encrypt pages
> without
> > > restoring them to on-heap.
> > >
> > > On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin 
> wrote:
> > >
> > > > Hello Igniters.
> > > >
> > > > Recently, master key rotation for Apache Ignite Transparent Data
> > > > Encryption was implemented [1], but some security standards (PCI DSS
> > > > at least) require rotation of all encryption keys [2]. Currently,
> > > > encryption occurs when reading/writing pages to disk, cache
> encryption
> > > > keys are stored in metastore.
> > > >
> > > > I'm going to contribute cache encryption key rotation and want to
> > > > consult what is the best way to re-encrypting existing data, I see
> two
> > > > different strategies.
> > > >
> > > > 1. In place re-encryption:
> > > > Using the old key, sequentially read all the pages from the
> datastore,
> > > > mark as dirty and log them into the WAL. After checkpoint pages will
> > > > be stored to disk encrypted with the new key (as usual, along with
> > > > updates). This strategy requires store the identifier (number) of the
> > > > encryption key into the encrypted page.
> > > > pros:
> > > >   - can work in the background with minimal performance impact (this
> > > > impact can be managed).
> > > > cons:
> > > >   - page duplication in the WAL may affect performance and historical
> > > > rebalance.
> > > >
> > > > 2. Copy partition with re-encryption.
> > > > This strategy is similar to partition snapshotting [3] - create
> > > > partition copy encrypted with the new key and then replace the
> > > > original partition 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-17 Thread Pavel Pereslegin
Hello, Alexey.

> is the encryption key for the data the same on all nodes in the cluster?
Yes, each encrypted cache group has its own encryption key, the key is
the same on all nodes.

> Clearly, during the re-encryption there will exist pages
> encrypted with both new and old keys at the same time.
Yes, there will be pages encrypted with different keys at the same time.
Currently, we only store one key for one cache group. To rotate a key,
at a certain point in time it is necessary to support several keys (at
least for reading the WAL).
For the "in place" strategy, we'll store the encryption key identifier
on each encrypted page (we currently have some unused space on
encrypted page, so I don't expect any memory overhead here). Thus, we
will have several keys for reading and one key for writing. I assume
that the old key will be automatically deleted when a specific WAL
segment is deleted (and re-encryption is finished).

> Will a node continue to re-encrypt the data after it restarts?
Yes.

> If a node goes down during the re-encryption, but the rest of the
> cluster finishes re-encryption, will we consider the procedure complete?
I'm not sure, but it looks like the key rotation is complete when we
set the new key on all nodes so that the updates will be encrypted
with the new key (as required by PCI DSS).
Status of re-encryption can be obtained separately (locally or cluster wide).

I forgot to mention that with “in place” re-encryption it will be
impossible to quickly cancel re-encryption, because by canceling we
mean re-encryption with the old key.

> How do you see the whole key rotation procedure will work?
Initial design for re-encryption with "partition copying" is described
here [1]. I'll prepare detailed design for "in place" re-encryption if
we'll go this way. In short, send the new encryption key cluster-wide,
each node adds a new key and starts background re-encryption.

[1] 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign.

вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk :
>
> Pavel, Anton,
>
> How do you see the whole key rotation procedure will work? Clearly, during
> the re-encryption there will exist pages encrypted with both new and old
> keys at the same time. Will a node continue to re-encrypt the data after it
> restarts? If a node goes down during the re-encryption, but the rest of the
> cluster finishes re-encryption, will we consider the procedure complete? By
> the way, is the encryption key for the data the same on all nodes in the
> cluster?
>
> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov :
>
> > +1 to "In place re-encryption".
> >
> > - It has a simple design.
> > - Clusters under load may require just load to re-encrypt the data.
> > (Friendly to load).
> > - Easy to throttle.
> > - Easy to continue.
> > - Design compatible with the multi-key architecture.
> > - It can be optimized to use own WAL buffer and to re-encrypt pages without
> > restoring them to on-heap.
> >
> > On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin  wrote:
> >
> > > Hello Igniters.
> > >
> > > Recently, master key rotation for Apache Ignite Transparent Data
> > > Encryption was implemented [1], but some security standards (PCI DSS
> > > at least) require rotation of all encryption keys [2]. Currently,
> > > encryption occurs when reading/writing pages to disk, cache encryption
> > > keys are stored in metastore.
> > >
> > > I'm going to contribute cache encryption key rotation and want to
> > > consult what is the best way to re-encrypting existing data, I see two
> > > different strategies.
> > >
> > > 1. In place re-encryption:
> > > Using the old key, sequentially read all the pages from the datastore,
> > > mark as dirty and log them into the WAL. After checkpoint pages will
> > > be stored to disk encrypted with the new key (as usual, along with
> > > updates). This strategy requires store the identifier (number) of the
> > > encryption key into the encrypted page.
> > > pros:
> > >   - can work in the background with minimal performance impact (this
> > > impact can be managed).
> > > cons:
> > >   - page duplication in the WAL may affect performance and historical
> > > rebalance.
> > >
> > > 2. Copy partition with re-encryption.
> > > This strategy is similar to partition snapshotting [3] - create
> > > partition copy encrypted with the new key and then replace the
> > > original partition file with the new one (see details [4]).
> > > pros:
> > >   - should work faster than "in place" re-encryption.
> > > cons:
> > >   - re-encryption in active cluster (and on unstable topology) can be
> > > difficult to implement.
> > >
> > > (See more detailed comparison [5])
> > >
> > > Re-encryption of existing data is a long and rare procedure (It is
> > > recommended to change the key every 6 months, but at least once every
> > > 2 years). Thus, re-encryption can be implemented for maintenance mode
> > > (for example, on a 

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-17 Thread Alexey Goncharuk
Pavel, Anton,

How do you see the whole key rotation procedure will work? Clearly, during
the re-encryption there will exist pages encrypted with both new and old
keys at the same time. Will a node continue to re-encrypt the data after it
restarts? If a node goes down during the re-encryption, but the rest of the
cluster finishes re-encryption, will we consider the procedure complete? By
the way, is the encryption key for the data the same on all nodes in the
cluster?

чт, 14 мая 2020 г. в 11:30, Anton Vinogradov :

> +1 to "In place re-encryption".
>
> - It has a simple design.
> - Clusters under load may require just load to re-encrypt the data.
> (Friendly to load).
> - Easy to throttle.
> - Easy to continue.
> - Design compatible with the multi-key architecture.
> - It can be optimized to use own WAL buffer and to re-encrypt pages without
> restoring them to on-heap.
>
> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin  wrote:
>
> > Hello Igniters.
> >
> > Recently, master key rotation for Apache Ignite Transparent Data
> > Encryption was implemented [1], but some security standards (PCI DSS
> > at least) require rotation of all encryption keys [2]. Currently,
> > encryption occurs when reading/writing pages to disk, cache encryption
> > keys are stored in metastore.
> >
> > I'm going to contribute cache encryption key rotation and want to
> > consult what is the best way to re-encrypting existing data, I see two
> > different strategies.
> >
> > 1. In place re-encryption:
> > Using the old key, sequentially read all the pages from the datastore,
> > mark as dirty and log them into the WAL. After checkpoint pages will
> > be stored to disk encrypted with the new key (as usual, along with
> > updates). This strategy requires store the identifier (number) of the
> > encryption key into the encrypted page.
> > pros:
> >   - can work in the background with minimal performance impact (this
> > impact can be managed).
> > cons:
> >   - page duplication in the WAL may affect performance and historical
> > rebalance.
> >
> > 2. Copy partition with re-encryption.
> > This strategy is similar to partition snapshotting [3] - create
> > partition copy encrypted with the new key and then replace the
> > original partition file with the new one (see details [4]).
> > pros:
> >   - should work faster than "in place" re-encryption.
> > cons:
> >   - re-encryption in active cluster (and on unstable topology) can be
> > difficult to implement.
> >
> > (See more detailed comparison [5])
> >
> > Re-encryption of existing data is a long and rare procedure (It is
> > recommended to change the key every 6 months, but at least once every
> > 2 years). Thus, re-encryption can be implemented for maintenance mode
> > (for example, on a stable topology in a read-only cluster) and in such
> > case the approach with partition copying seems simpler and faster.
> >
> > So, what do you think - do we need "online" re-encryption and which of
> > the proposed options is best suited for this?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-12186
> > [2] https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
> > [3]
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
> > [4]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> > .
> > [5]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
> >
>


Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-14 Thread Anton Vinogradov
+1 to "In place re-encryption".

- It has a simple design.
- Clusters under load may require just load to re-encrypt the data.
(Friendly to load).
- Easy to throttle.
- Easy to continue.
- Design compatible with the multi-key architecture.
- It can be optimized to use own WAL buffer and to re-encrypt pages without
restoring them to on-heap.

On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin  wrote:

> Hello Igniters.
>
> Recently, master key rotation for Apache Ignite Transparent Data
> Encryption was implemented [1], but some security standards (PCI DSS
> at least) require rotation of all encryption keys [2]. Currently,
> encryption occurs when reading/writing pages to disk, cache encryption
> keys are stored in metastore.
>
> I'm going to contribute cache encryption key rotation and want to
> consult what is the best way to re-encrypting existing data, I see two
> different strategies.
>
> 1. In place re-encryption:
> Using the old key, sequentially read all the pages from the datastore,
> mark as dirty and log them into the WAL. After checkpoint pages will
> be stored to disk encrypted with the new key (as usual, along with
> updates). This strategy requires store the identifier (number) of the
> encryption key into the encrypted page.
> pros:
>   - can work in the background with minimal performance impact (this
> impact can be managed).
> cons:
>   - page duplication in the WAL may affect performance and historical
> rebalance.
>
> 2. Copy partition with re-encryption.
> This strategy is similar to partition snapshotting [3] - create
> partition copy encrypted with the new key and then replace the
> original partition file with the new one (see details [4]).
> pros:
>   - should work faster than "in place" re-encryption.
> cons:
>   - re-encryption in active cluster (and on unstable topology) can be
> difficult to implement.
>
> (See more detailed comparison [5])
>
> Re-encryption of existing data is a long and rare procedure (It is
> recommended to change the key every 6 months, but at least once every
> 2 years). Thus, re-encryption can be implemented for maintenance mode
> (for example, on a stable topology in a read-only cluster) and in such
> case the approach with partition copying seems simpler and faster.
>
> So, what do you think - do we need "online" re-encryption and which of
> the proposed options is best suited for this?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-12186
> [2] https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
> [3]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
> [4]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> .
> [5]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
>


[DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-05-13 Thread Pavel Pereslegin
Hello Igniters.

Recently, master key rotation for Apache Ignite Transparent Data
Encryption was implemented [1], but some security standards (PCI DSS
at least) require rotation of all encryption keys [2]. Currently,
encryption occurs when reading/writing pages to disk, cache encryption
keys are stored in metastore.

I'm going to contribute cache encryption key rotation and want to
consult what is the best way to re-encrypting existing data, I see two
different strategies.

1. In place re-encryption:
Using the old key, sequentially read all the pages from the datastore,
mark as dirty and log them into the WAL. After checkpoint pages will
be stored to disk encrypted with the new key (as usual, along with
updates). This strategy requires store the identifier (number) of the
encryption key into the encrypted page.
pros:
  - can work in the background with minimal performance impact (this
impact can be managed).
cons:
  - page duplication in the WAL may affect performance and historical rebalance.

2. Copy partition with re-encryption.
This strategy is similar to partition snapshotting [3] - create
partition copy encrypted with the new key and then replace the
original partition file with the new one (see details [4]).
pros:
  - should work faster than "in place" re-encryption.
cons:
  - re-encryption in active cluster (and on unstable topology) can be
difficult to implement.

(See more detailed comparison [5])

Re-encryption of existing data is a long and rare procedure (It is
recommended to change the key every 6 months, but at least once every
2 years). Thus, re-encryption can be implemented for maintenance mode
(for example, on a stable topology in a read-only cluster) and in such
case the approach with partition copying seems simpler and faster.

So, what do you think - do we need "online" re-encryption and which of
the proposed options is best suited for this?

[1] https://issues.apache.org/jira/browse/IGNITE-12186
[2] https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
[3] 
https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
[4] 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign.
[5] 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison