On 11/06/18 12:22, Masahiko Sawada wrote:
On Fri, May 25, 2018 at 8:41 PM, Moon, Insung
<moon_insung...@lab.ntt.co.jp> wrote:
Hello Hackers,

This propose a way to develop "Table-level" Transparent Data Encryption (TDE) 
and Key Management Service (KMS) support in
PostgreSQL.


Issues on data encryption of PostgreSQL
==========
Currently, in PostgreSQL, data encryption can be using pgcrypto Tool.
However, it is inconvenient to use pgcrypto to encrypts data in some cases.

There are two significant inconveniences.

First, if we use pgcrypto to encrypt/decrypt data, we must call pgcrypto 
functions everywhere we encrypt/decrypt.
Second, we must modify application program code much if we want to do database 
migration to PostgreSQL from other databases that is
using TDE.

To resolved these inconveniences, many users want to support TDE.
There have also been a few proposals, comments, and questions to support TDE in 
the PostgreSQL community.

However, currently PostgreSQL does not support TDE, so in development 
community, there are discussions whether it's necessary to
support TDE or not.

In these discussions, there were requirements necessary to support TDE in 
PostgreSQL.

1) The performance overhead of encryption and decryption database data must be 
minimized
2) Need to support WAL encryption.
3) Need to support Key Management Service.

Therefore, I'd like to propose the new design of TDE that deals with both above 
requirements.
Since this feature will become very large, I'd like to hear opinions from 
community before starting making the patch.

First, my proposal is table-level TDE which is that user can specify tables 
begin encrypted.
Indexes, TOAST table and WAL associated with the table that enables TDE are 
also encrypted.

Moreover, I want to support encryption for large object as well.
But I haven't found a good way for it so far. So I'd like to remain it as 
future TODO.

My proposal has five characteristics features of "table-level TDE".

1) Buffer-level data encryption and decryption
2) Per-table encryption
3) 2-tier encryption key management
4) Working with external key management services(KMS)
5) WAL encryption

Here are more details for each items.


1. Buffer-level data encryption and decryption
==================
Transparent data encryption and decryption accompany by storage operation
With ordinally way like using pgcrypto, the biggest problem with encrypted data 
is the performance overhead of decrypting the data
each time the run to queries.

My proposal is to encrypt and decrypt data when performing DISK I/O operation 
to minimize performance overhead.
Therefore, the data in the shared memory layer is unencrypted so that 
performance overhead can minimize.

With this design, data encryption/decryption implementations can be developed 
by modifying the codes of the storage and buffer
manager modules,
which are responsible for performing DISK I/O operation.


2. Per-table encryption
==================
User can enable TDE per table as they want.
I introduce new storage parameter "encryption_enabled" which enables TDE at 
table-level.

     // Generate  the encryption table
        CREATE TABLE foo WITH ( ENCRYPTION_ENABLED = ON );

     // Change to the non-encryption table
        ALTER TABLE foo SET ( ENCRYPTION_ENABLED = OFF );

This approach minimizes the overhead for tables that do not require encryption 
options.
For tables that enable TDE, the corresponding table key will be generated with 
random values, and it's stored into the new system
catalog after being encrypted by the master key.

BTW, I want to support CBC mode encryption[3]. However, I'm not sure how to use 
the IV in CBC mode for this proposal.
I'd like to hear opinions by security engineer.


3. 2-tier encryption key management
==================
when it comes time to change cryptographic keys, there is a performance 
overhead to decryption and re-encryption to all data.

To solve this problem we employee 2-tier encryption.
2-tier encryption is All table keys can be stored in the database cluster after 
being encrypted by the master key, And master keys
must be stored at external of PostgreSQL.

Therefore, without master key, it is impossible to decrypt the table key. Thus, 
It is impossible to decrypt the database data.

When changing the key, it's not necessary to re-encrypt for all data.
We use the new master key only to decrypt and re-encrypt the table key, these 
operations for minimizing the performance overhead.

For table keys, all TDE-enabled tables have different table keys.
And for master key, all database have different master keys. Table keys are 
encrypted by the master key of its own database.
For WAL encryption, we have another cryptographic key. WAL-key is also 
encrypted by a master key, but it is shared across the
database cluster.


4. Working with external key management services(KMS)
==================
A key management service is an integrated approach for generating, fetching and 
managing encryption keys for key control.
They may cover all aspects of security from the secure generation of keys, 
secure storing keys, and secure fetching keys up to
encryption key handling.
Also, various types of KMSs are provided by many companies, and users can 
choose them.

Therefore I would like to manage the master key using KMS.
Also, my proposal is to create callback APIs(generate_key, fetch_key, 
store_key) in the form of a plug-in so that users can use many
types of KMS as they want.

In KMIP protocol and most KMS manage keys by string IDs. We can get keys by key 
ID from KMS.
So in my proposal, all master keys are distinguished by its ID, called "master key 
ID".
The master key ID is made, for example, using the database oid and a sequence number, like 
<OID>_<SeqNo>. And they are managed in
PostgreSQL.

When database startup, all master key ID is loaded to shared memory, and they 
are protected by LWLock.

When it comes time to rotate the master keys, run this query.

         ALTER SYSTEM ROTATION MASTER KEY;

In this query, the master key is rotated with the following step.
1. Generate new master key,
2. Change master key IDs and emit corresponding WAL
3. Re-encrypt all table keys on its database

Also during checkpoint, master key IDs on shared memory become a permanent 
condition.


5. WAL encryption
==================
If we encrypt all WAL records, performance overhead can be significant.
Therefore, this proposes a method to encrypt only WAL record excluding WAL 
header when writing WAL on the WAL buffer, instead of
encrypting a whole WAL record.
WAL encryption key is generated separately when the TDE-enabled table is 
created the first time. We use 2-tier encryption for WAL
encryption as well.
So, when it comes time to rotate the WAL encryption key, run this query.

         ALTER SYSTEM ROTATION WAL KEY;

Next, I will explain how to encrypt WAL.

To do this operation, I add a flag to WAL header which indicates whether the 
subsequent WAL data is encrypted or not.

Then, when we write WAL for encryption table we write "encrypted" WAL on WAL 
buffer layer.

In recovery, we read WAL header and check the flag of encryption, and judges 
whether WAL must be decrypted.
In the case of PITR, we use WAL key ID in the backup file.

With this approach, the performance overhead of writing and reading the WAL for 
unencrypted tables would be almost the same as
before.



    I may have missed part of the conversation and/or this may be a naïve question, but what about pg_stats? I guess data should be encrypted there too, and I wonder how this would affect the query planner and how it could decrypt this information. Also would a separate key be used for the stats?


    Thanks,

    Álvaro


--

Alvaro Hernandez


-----------
OnGres


Reply via email to