Hello,

Following up on the RFC, I am submitting the initial patch set for the
proposed infrastructure. These patches introduce a minimal hook-based
protocol to allow extensions to handle data transformation, such as TDE,
while keeping the PostgreSQL core independent of specific cryptographic
implementations.

Implementation Details:

Hook Points in Storage I/O Path
The patch introduces five strategic hook points:

mdread_post_hook: Called after blocks are read from disk. The extension can
reverse-transform data in place.

mdwrite_pre_hook & mdextend_pre_hook: Called before writing or extending
blocks. These hooks return a pointer to transformed buffers.

xlog_insert_pre_hook & xlog_decode_pre_hook: Handle transformation for WAL
records during insertion and replay.

Data Integrity and Checksum Protocol
To ensure robust error detection, the hooks follow a specific verification
protocol:

On Write: The extension transforms the page, sets the Transform ID, then
recalculates the checksum on the transformed data.

On Read: The extension verifies the on-disk checksum of the transformed
data first. After reverse-transformation, it clears the Transform ID and
recalculates the checksum for the plaintext data. This ensures corruption
is detected regardless of the transformation state.

WAL Safety via XLR_BLOCK_ID_TRANSFORMED (251)
For WAL records, I have introduced a specific block ID (251) to mark
transformed data. If the decryption extension is not loaded, the WAL reader
will encounter this unknown block ID and fail-fast, preventing the system
from incorrectly interpreting encrypted data as valid WAL records.

PageHeader Transform ID (5-bit)
I have allocated bits 3-7 of pd_flags in the PageHeader for a Transform ID.
This allows the engine and extensions to identify the transformation state
of a page (e.g., key versioning or algorithm type) without attempting
decryption. It ensures backward compatibility: pages with Transform ID 0
are treated as standard untransformed pages.

Memory and Critical Section Safety
As demonstrated in the contrib/test_tde reference implementation, cipher
contexts are pre-allocated in _PG_init to avoid memory allocation during
critical sections. For WAL transformation,
MemoryContextAllowInCriticalSection() is used to allow buffer reallocation
within critical sections; if OOM occurs during buffer growth, it results in
a controlled PANIC.

Performance Considerations
When hooks are not set (default), the overhead is limited to a single NULL
pointer comparison per I/O operation. This is architecturally consistent
with existing PostgreSQL hooks and is designed to have a negligible impact
on performance.

Attached Patches:

v20251228-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patch: Core
infrastructure.
v20251228-0002-Add-test_tde-extension-for-TDE-testing.patch: Reference
implementation using AES-256-CTR.

I look forward to your comments and feedback.

Regards,

Henson Choi

2025년 12월 28일 (일) PM 4:49, Henson Choi <[email protected]>님이 작성:

> RFC: PostgreSQL Storage I/O Transformation Hooks Infrastructure for a
> Technical Protocol Between RDBMS Core and Data Security Experts
>
> *Author:* Henson Choi [email protected]
>
> *Date:* 2025-12-28
>
> *PostgreSQL Version:* master (Development)
> ------------------------------
> 1. Summary & Motivation
>
> This RFC proposes the introduction of minimal hooks into the PostgreSQL
> storage layer and the addition of a *Transformation ID* field to the
> PageHeader.
> A Diplomatic Protocol Between Expert Groups
>
> The core motivation of this proposal is *“Separation of Concerns and
> Mutual Respect.”*
>
> Historically, discussions around Transparent Data Encryption (TDE) have
> often felt like putting security experts on trial in a foreign
> court—specifically, the “Court of RDBMS.” It is time to treat them not as
> defendants to be judged by database-specific rules, but as an *equal
> neighboring community* with their own specialized sovereignty.
>
> *The issue has never been a failure of technology, but rather a
> misplacement of the focal point.* While previous discussions were mired
> in the technicalities of “how to hardcode encryption into the core,” this
> proposal shifts the debate toward an architectural solution: “what
> interface the core should provide to external experts.”
>
>    - *RDBMS Experts* provide a trusted pipeline responsible for data I/O
>    paths and consistency.
>    - *Security Experts* take responsibility for the specialized domain of
>    encryption algorithms and key management.
>
> This hook system functions as a *Technical Protocol*—a high-level
> agreement that allows these two expert groups to exchange data securely
> without encroaching on each other’s territory.
> ------------------------------
> 2. Design Principles
>
>    1. *Delegation of Authority:* The core remains independent of specific
>    encryption standards, providing a “free territory” where security experts
>    can respond to an ever-changing security landscape.
>    2. *Diplomatic Convention:* The Transformation ID acts as a
>    communication protocol between the engine and the extension. The engine
>    uses this ID to identify the state of the data and hands over control to
>    the appropriate expert (the extension).
>    3. *Minimal Interference:* Overhead is kept near zero when hooks are
>    not in use, ensuring the native performance of the PostgreSQL engine.
>
> ------------------------------
> 3. Proposal Specifications 3.1 The Interface (Hook Points)
>
> We allow intervention by security experts through five contact points
> along the I/O path:
>
>    - *Read/Write Hooks:* mdread_post, mdwrite_pre, mdextend_pre
>    (Transformation of the data area)
>    - *WAL Hooks:* xlog_insert_pre, xlog_decode_pre (Transformation of
>    transaction logs)
>
> 3.2 The Protocol Identifier (PageHeader Transformation ID)
>
> We allocate 5 bits of pd_flags to define the “Security State” of a page.
> This serves as a *Status Message* sent by the security expert to the
> engine, utilized for key versioning and as a migration marker.
> ------------------------------
> 4. Reference Implementation: contrib/test_tde A Standard Code of Conduct
> for Security Experts
>
> This reference implementation exists not as a commercial product, but to
> define the *Standards of the Diplomatic Protocol* that
> encryption/decryption experts must follow when entering the PostgreSQL
> domain.
>
>    1. *Deterministic IV Derivation:* Demonstrates how to achieve
>    cryptographic safety by trusting unique values provided by the engine
>    (e.g., LSN).
>    2. *Critical Section Safety:* Defines memory management regulations
>    that security logic must follow within “Critical Sections” to maintain
>    system stability.
>    3. *Hook Chaining:* Demonstrates a cooperative structure that allows
>    peaceful coexistence with other expert tools (e.g., compression, auditing).
>
> ------------------------------
> 5. Scope
>
>    - *In-Scope:* Backend hook infrastructure, Transformation ID field,
>    and reference code demonstrating diplomatic protocol compliance.
>    - *Out-of-Scope:* Specific Key Management Systems (KMS), selection of
>    specific cryptographic algorithms, and integration with external tools.
>
> This proposal represents a strategic diplomatic choice: rather than the
> PostgreSQL core assuming all security responsibilities, it grants security
> experts a *sovereign territory through extensions* where they can perform
> at their best.
>
From 39d19fc7127124e007ce6bede487209afba6d827 Mon Sep 17 00:00:00 2001
From: Henson Choi <[email protected]>
Date: Tue, 2 Dec 2025 21:50:12 +0900
Subject: [PATCH] Add Storage I/O Transform Hooks for PostgreSQL

This patch introduces a set of hook points that allow extensions to
intercept and transform data during storage I/O operations.  The hooks
are designed to support transparent data encryption (TDE) and similar
use cases that require data transformation at the storage layer.

The following hooks are added:

  - page_encrypt_hook / page_decrypt_hook in bufmgr.c for buffer page
    transformation during read/write operations
  - xlog_insert_pre_hook in xloginsert.c for WAL record transformation
    before assembly
  - xlog_decrypt_record_hook in xlogreader.c for WAL record
    transformation during replay
  - smgr_write_transform_hook / smgr_read_transform_hook in md.c for
    low-level storage manager I/O transformation

Each hook is optional and defaults to NULL, ensuring no overhead when
extensions are not loaded.

Author: Henson Choi <[email protected]>
---
 src/backend/access/transam/xloginsert.c | 10 ++++
 src/backend/access/transam/xlogreader.c | 21 ++++++++
 src/backend/storage/buffer/bufmgr.c     |  9 ++++
 src/backend/storage/smgr/md.c           | 20 ++++++++
 src/include/access/xloginsert.h         | 20 ++++++++
 src/include/access/xlogreader.h         | 20 ++++++++
 src/include/access/xlogrecord.h         |  5 ++
 src/include/storage/bufpage.h           | 25 +++++++++-
 src/include/storage/md.h                | 65 +++++++++++++++++++++++++
 9 files changed, 194 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index a56d5a55282..f518ef3f16f 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -136,6 +136,12 @@ static bool begininsert_called = false;
 /* Memory context to hold the registered buffer and data references. */
 static MemoryContext xloginsert_cxt;
 
+/*
+ * Hook variable for WAL insert transformation (e.g., encryption).
+ * Extensions can set this hook to transform WAL data before assembly.
+ */
+xlog_insert_pre_hook_type xlog_insert_pre_hook = NULL;
+
 static XLogRecData *XLogRecordAssemble(RmgrId rmid, uint8 info,
 									   XLogRecPtr RedoRecPtr, bool doPageWrites,
 									   XLogRecPtr *fpw_lsn, int *num_fpi,
@@ -526,6 +532,10 @@ XLogInsert(RmgrId rmid, uint8 info)
 								 &fpw_lsn, &num_fpi, &fpi_bytes,
 								 &topxid_included);
 
+		/* Pre-insert hook for transformation (e.g., encryption) */
+		if (xlog_insert_pre_hook)
+			rdt = xlog_insert_pre_hook(rdt);
+
 		EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpi,
 								  fpi_bytes, topxid_included);
 	} while (!XLogRecPtrIsValid(EndPos));
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 5e5001b2101..169f2b06fc5 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -40,6 +40,13 @@
 #include "common/logging.h"
 #endif
 
+/*
+ * Hook variable for WAL record transformation (e.g., decryption).
+ * Extensions can set this hook to transform raw WAL data before decoding.
+ * Frontend tools can also set this hook at startup.
+ */
+xlog_decode_pre_hook_type xlog_decode_pre_hook = NULL;
+
 static void report_invalid_record(XLogReaderState *state, const char *fmt,...)
 			pg_attribute_printf(2, 3);
 static void allocate_recordbuf(XLogReaderState *state, uint32 reclength);
@@ -843,6 +850,11 @@ restart:
 		Assert(gotheader);
 
 		record = (XLogRecord *) state->readRecordBuf;
+
+		/* Pre-validation hook for transformation (e.g., decryption) */
+		if (xlog_decode_pre_hook)
+			record = xlog_decode_pre_hook(state, record, RecPtr, true);
+
 		if (!ValidXLogRecord(state, record, RecPtr))
 			goto err;
 
@@ -862,6 +874,15 @@ restart:
 			goto err;
 
 		/* Record does not cross a page boundary */
+
+		/*
+		 * Pre-validation hook for transformation (e.g., decryption).
+		 * inplace_allowed is false because record points to readBuf, which
+		 * may be copied back to WAL files (e.g., FinishWalRecovery).
+		 */
+		if (xlog_decode_pre_hook)
+			record = xlog_decode_pre_hook(state, record, RecPtr, false);
+
 		if (!ValidXLogRecord(state, record, RecPtr))
 			goto err;
 
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index eb55102b0d7..eb13a17fa94 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -57,6 +57,7 @@
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
+#include "storage/md.h"
 #include "storage/proc.h"
 #include "storage/read_stream.h"
 #include "storage/smgr.h"
@@ -7401,6 +7402,14 @@ buffer_readv_complete_one(PgAioTargetData *td, uint8 buf_off, Buffer buffer,
 			VALGRIND_MAKE_MEM_DEFINED(bufdata, BLCKSZ);
 #endif
 
+		/* Decrypt block before checksum verification */
+		if (mdread_post_hook)
+		{
+			RelFileLocator rlocator = BufTagGetRelFileLocator(&tag);
+
+			mdread_post_hook(&rlocator, tag.forkNum, tag.blockNum, &bufdata, 1);
+		}
+
 		if (!PageIsVerified((Page) bufdata, tag.blockNum, piv_flags,
 							failed_checksum))
 		{
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 71bcdeb6601..5416128d2cc 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -96,6 +96,14 @@ typedef struct _MdfdVec
 
 static MemoryContext MdCxt;		/* context for all MdfdVec objects */
 
+/*
+ * Hook variables for I/O transformation (e.g., encryption/decryption).
+ * Extensions can set these hooks to transform data during storage I/O.
+ */
+mdread_post_hook_type mdread_post_hook = NULL;
+mdwrite_pre_hook_type mdwrite_pre_hook = NULL;
+mdextend_pre_hook_type mdextend_pre_hook = NULL;
+
 
 /* Populate a file tag describing an md.c segment file. */
 #define INIT_MD_FILETAG(a,xx_rlocator,xx_forknum,xx_segno) \
@@ -513,6 +521,10 @@ mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 						relpath(reln->smgr_rlocator, forknum).str,
 						InvalidBlockNumber)));
 
+	/* Pre-extend hook for transformation (e.g., encryption) */
+	if (mdextend_pre_hook)
+		buffer = mdextend_pre_hook(&reln->smgr_rlocator.locator, forknum, blocknum, buffer);
+
 	v = _mdfd_getseg(reln, forknum, blocknum, skipFsync, EXTENSION_CREATE);
 
 	seekpos = (pgoff_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
@@ -972,6 +984,10 @@ mdreadv(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 			iovcnt = compute_remaining_iovec(iov, iov, iovcnt, nbytes);
 		}
 
+		/* Post-read hook for transformation (e.g., decryption) */
+		if (mdread_post_hook)
+			mdread_post_hook(&reln->smgr_rlocator.locator, forknum, blocknum, buffers, nblocks_this_segment);
+
 		nblocks -= nblocks_this_segment;
 		buffers += nblocks_this_segment;
 		blocknum += nblocks_this_segment;
@@ -1064,6 +1080,10 @@ mdwritev(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	Assert((uint64) blocknum + (uint64) nblocks <= (uint64) mdnblocks(reln, forknum));
 #endif
 
+	/* Pre-write hook for transformation (e.g., encryption) */
+	if (mdwrite_pre_hook)
+		buffers = mdwrite_pre_hook(&reln->smgr_rlocator.locator, forknum, blocknum, buffers, nblocks);
+
 	while (nblocks > 0)
 	{
 		struct iovec iov[PG_IOV_MAX];
diff --git a/src/include/access/xloginsert.h b/src/include/access/xloginsert.h
index d6a71415d4f..cc54459ad33 100644
--- a/src/include/access/xloginsert.h
+++ b/src/include/access/xloginsert.h
@@ -19,6 +19,26 @@
 #include "storage/relfilelocator.h"
 #include "utils/relcache.h"
 
+/* Forward declaration for XLogRecData */
+struct XLogRecData;
+
+/*
+ * Hook function type for WAL insert transformation (e.g., encryption).
+ * Called after XLogRecordAssemble() but before XLogInsertRecord().
+ * Extension can transform the assembled WAL record data for encryption.
+ * Returns the (possibly modified) XLogRecData chain to be inserted.
+ *
+ * The first node's data points to XLogRecord header, which contains
+ * xl_rmid and xl_info if needed by the hook.
+ *
+ * On failure, the hook should either PANIC or return the original rdata
+ * as fallback.
+ */
+typedef struct XLogRecData *(*xlog_insert_pre_hook_type) (struct XLogRecData *rdata);
+
+/* Hook variable for WAL insert transformation */
+extern PGDLLIMPORT xlog_insert_pre_hook_type xlog_insert_pre_hook;
+
 /*
  * The minimum size of the WAL construction working area. If you need to
  * register more than XLR_NORMAL_MAX_BLOCK_ID block references or have more
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index dfabbbd57d4..898d52a1013 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -400,6 +400,26 @@ extern bool DecodeXLogRecord(XLogReaderState *state,
 							 XLogRecPtr lsn,
 							 char **errormsg);
 
+/*
+ * Hook function type for WAL record transformation (e.g., decryption).
+ * Called before ValidXLogRecord() and DecodeXLogRecord().
+ * Extension can decrypt or transform the raw record data.
+ * Returns the (possibly modified) XLogRecord to be validated and decoded.
+ *
+ * If inplace_allowed is true, the hook may modify the record in place.
+ * If false, the hook must allocate a new buffer and return it.
+ *
+ * On failure, the hook should either PANIC or return the original record
+ * as fallback.
+ */
+typedef XLogRecord *(*xlog_decode_pre_hook_type) (XLogReaderState *state,
+												  XLogRecord *record,
+												  XLogRecPtr lsn,
+												  bool inplace_allowed);
+
+/* Hook variable for WAL record transformation */
+extern PGDLLIMPORT xlog_decode_pre_hook_type xlog_decode_pre_hook;
+
 /*
  * Macros that provide access to parts of the record most recently returned by
  * XLogReadRecord() or XLogNextRecord().
diff --git a/src/include/access/xlogrecord.h b/src/include/access/xlogrecord.h
index a06833ce0a3..9cfb2aff5ae 100644
--- a/src/include/access/xlogrecord.h
+++ b/src/include/access/xlogrecord.h
@@ -244,5 +244,10 @@ typedef struct XLogRecordDataHeaderLong
 #define XLR_BLOCK_ID_DATA_LONG		254
 #define XLR_BLOCK_ID_ORIGIN			253
 #define XLR_BLOCK_ID_TOPLEVEL_XID	252
+/*
+ * I/O transform hook marker. Uses same header format as XLogRecordDataHeaderLong
+ * (1 byte id + 4 bytes length). Use SizeOfXLogRecordDataHeaderLong for size.
+ */
+#define XLR_BLOCK_ID_TRANSFORMED	251
 
 #endif							/* XLOGRECORD_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index abc2cf2a020..f18f77d3d22 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -189,7 +189,17 @@ typedef PageHeaderData *PageHeader;
 #define PD_ALL_VISIBLE		0x0004	/* all tuples on page are visible to
 									 * everyone */
 
-#define PD_VALID_FLAG_BITS	0x0007	/* OR of all valid pd_flags bits */
+/*
+ * Transform ID field (5 bits: values 0-31) for I/O transform extensions.
+ * Value 0 means the page is not transformed (backward compatible).
+ * Values 1-31 are available for extensions to define their own meanings
+ * (e.g., encryption key versions, algorithm identifiers, migration markers).
+ */
+#define PD_TRANSFORM_ID_MASK	0x00F8	/* bits 3-7 */
+#define PD_TRANSFORM_ID_SHIFT	3
+#define PD_TRANSFORM_NONE		0		/* not transformed (core reserved) */
+
+#define PD_VALID_FLAG_BITS	0x00FF	/* OR of all valid pd_flags bits */
 
 /*
  * Page layout version number 0 is for pre-7.3 Postgres releases.
@@ -441,6 +451,19 @@ PageClearAllVisible(Page page)
 	((PageHeader) page)->pd_flags &= ~PD_ALL_VISIBLE;
 }
 
+static inline uint8
+PageGetTransformId(const PageData *page)
+{
+	return (((const PageHeaderData *) page)->pd_flags & PD_TRANSFORM_ID_MASK) >> PD_TRANSFORM_ID_SHIFT;
+}
+static inline void
+PageSetTransformId(Page page, uint8 id)
+{
+	((PageHeader) page)->pd_flags =
+		(((PageHeader) page)->pd_flags & ~PD_TRANSFORM_ID_MASK) |
+		((id << PD_TRANSFORM_ID_SHIFT) & PD_TRANSFORM_ID_MASK);
+}
+
 /*
  * These two require "access/transam.h", so left as macros.
  */
diff --git a/src/include/storage/md.h b/src/include/storage/md.h
index b563c27abf0..0a766a2b61f 100644
--- a/src/include/storage/md.h
+++ b/src/include/storage/md.h
@@ -22,6 +22,71 @@
 
 extern PGDLLIMPORT const PgAioHandleCallbacks aio_md_readv_cb;
 
+/*
+ * Hook function types for I/O transformation (e.g., encryption/decryption).
+ * These hooks allow extensions to transform data during storage I/O operations.
+ */
+
+/*
+ * Called after blocks are read from disk, before PostgreSQL's checksum verification.
+ * Extension can reverse-transform (e.g., decrypt) the data in place.
+ *
+ * For synchronous reads, called from mdreadv() after read completes.
+ * For AIO reads, called from buffer_readv_complete_one() before PageIsVerified().
+ *
+ * Note: The hook is responsible for verifying on-disk checksum before reverse
+ * transformation and recalculating checksum after transformation. This ensures
+ * data integrity is verified at both stages and PostgreSQL's checksum verification
+ * passes.
+ *
+ * On failure, the hook should raise an ERROR (or PANIC for critical errors).
+ */
+typedef void (*mdread_post_hook_type) (RelFileLocator *rlocator,
+									   ForkNumber forknum,
+									   BlockNumber blocknum,
+									   void **buffers,
+									   BlockNumber nblocks);
+
+/*
+ * Called before mdwritev() writes blocks to disk.
+ * Extension can transform (e.g., encrypt) data.
+ * Returns pointer to transformed buffers array (hook manages the memory,
+ * typically using static local storage).
+ *
+ * Note: The hook should recalculate checksum on transformed data after
+ * transformation. This on-disk checksum will be verified on read before
+ * reverse transformation, ensuring disk-level data integrity.
+ *
+ * On failure, the hook should raise an ERROR (or PANIC for critical errors),
+ * or return the original buffers with a WARNING as fallback.
+ */
+typedef const void **(*mdwrite_pre_hook_type) (RelFileLocator *rlocator,
+											   ForkNumber forknum,
+											   BlockNumber blocknum,
+											   const void **buffers,
+											   BlockNumber nblocks);
+
+/*
+ * Called before mdextend() extends a relation with new blocks.
+ * Returns pointer to transformed buffer (hook manages the memory,
+ * typically using static local storage).
+ *
+ * Note: Same as write hook - the hook should recalculate checksum on
+ * transformed data after transformation.
+ *
+ * On failure, the hook should raise an ERROR (or PANIC for critical errors),
+ * or return the original buffer with a WARNING as fallback.
+ */
+typedef const void *(*mdextend_pre_hook_type) (RelFileLocator *rlocator,
+											   ForkNumber forknum,
+											   BlockNumber blocknum,
+											   const void *buffer);
+
+/* Hook variables for I/O transformation */
+extern PGDLLIMPORT mdread_post_hook_type mdread_post_hook;
+extern PGDLLIMPORT mdwrite_pre_hook_type mdwrite_pre_hook;
+extern PGDLLIMPORT mdextend_pre_hook_type mdextend_pre_hook;
+
 /* md storage manager functionality */
 extern void mdinit(void);
 extern void mdopen(SMgrRelation reln);
-- 
2.50.1 (Apple Git-155)

From f7837456638f37c0555f821822f7a5d113a68cce Mon Sep 17 00:00:00 2001
From: Henson Choi <[email protected]>
Date: Tue, 2 Dec 2025 21:51:13 +0900
Subject: [PATCH] Add test_tde extension for TDE testing

This extension provides a reference implementation for validating the
Storage I/O Transform Hooks introduced in the previous commit.  It uses
AES-256-CTR encryption with IV derived from page metadata (LSN, block
number, relation file number) to ensure uniqueness.

The extension registers hooks for:

  - Buffer page read/write transformation (mdread/mdwrite/mdextend)
  - WAL record insert and replay transformation

Key features:
  - Encryption key configured via test_tde.key GUC (256-bit hex)
  - System catalogs and pg_global tablespace excluded from encryption
  - Pre-allocated cipher context to avoid allocation in critical sections
  - WAL records marked with block ID 251 for encrypted record detection

This is intended for development and testing purposes only, not for
production use.  The implementation lacks key rotation, proper key
management, and security auditing.

Author: Henson Choi <[email protected]>
---
 contrib/Makefile                    |    4 +-
 contrib/test_tde/.gitignore         |    3 +
 contrib/test_tde/Makefile           |   27 +
 contrib/test_tde/expected/basic.out |  177 +++++
 contrib/test_tde/sql/basic.sql      |  146 ++++
 contrib/test_tde/test_tde.c         | 1131 +++++++++++++++++++++++++++
 contrib/test_tde/test_tde.conf      |    2 +
 7 files changed, 1488 insertions(+), 2 deletions(-)
 create mode 100644 contrib/test_tde/.gitignore
 create mode 100644 contrib/test_tde/Makefile
 create mode 100644 contrib/test_tde/expected/basic.out
 create mode 100644 contrib/test_tde/sql/basic.sql
 create mode 100644 contrib/test_tde/test_tde.c
 create mode 100644 contrib/test_tde/test_tde.conf

diff --git a/contrib/Makefile b/contrib/Makefile
index 2f0a88d3f77..151eb823850 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -54,9 +54,9 @@ SUBDIRS = \
 		vacuumlo
 
 ifeq ($(with_ssl),openssl)
-SUBDIRS += pgcrypto sslinfo
+SUBDIRS += pgcrypto sslinfo test_tde
 else
-ALWAYS_SUBDIRS += pgcrypto sslinfo
+ALWAYS_SUBDIRS += pgcrypto sslinfo test_tde
 endif
 
 ifneq ($(with_uuid),no)
diff --git a/contrib/test_tde/.gitignore b/contrib/test_tde/.gitignore
new file mode 100644
index 00000000000..2ea3752951a
--- /dev/null
+++ b/contrib/test_tde/.gitignore
@@ -0,0 +1,3 @@
+log
+results
+tmp_check
diff --git a/contrib/test_tde/Makefile b/contrib/test_tde/Makefile
new file mode 100644
index 00000000000..b2455d3831e
--- /dev/null
+++ b/contrib/test_tde/Makefile
@@ -0,0 +1,27 @@
+# contrib/test_tde/Makefile
+
+MODULE_big = test_tde
+OBJS = \
+	$(WIN32RES) \
+	test_tde.o
+
+PGFILEDESC = "test_tde - reference implementation for I/O transform hooks"
+
+REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_tde/test_tde.conf
+REGRESS = basic
+# Disabled because these tests require "shared_preload_libraries=test_tde"
+NO_INSTALLCHECK = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/test_tde
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+# OpenSSL is required for encryption
+SHLIB_LINK += $(filter -lcrypto, $(LIBS))
diff --git a/contrib/test_tde/expected/basic.out b/contrib/test_tde/expected/basic.out
new file mode 100644
index 00000000000..9932cf43614
--- /dev/null
+++ b/contrib/test_tde/expected/basic.out
@@ -0,0 +1,177 @@
+-- Basic test for test_tde extension
+-- Verify that encryption/decryption works correctly
+-- Show current settings
+SHOW test_tde.key;
+                           test_tde.key                           
+------------------------------------------------------------------
+ 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
+(1 row)
+
+-- Create a test table
+CREATE TABLE test_encrypt (
+    id serial PRIMARY KEY,
+    secret_data text,
+    secret_number integer
+);
+-- Insert some data
+INSERT INTO test_encrypt (secret_data, secret_number) VALUES
+    ('This is secret data', 12345),
+    ('Another secret message', 67890),
+    ('PostgreSQL TDE test', 11111);
+-- Force a checkpoint to ensure data is written to disk
+CHECKPOINT;
+-- Read data back - should be decrypted correctly
+SELECT * FROM test_encrypt ORDER BY id;
+ id |      secret_data       | secret_number 
+----+------------------------+---------------
+  1 | This is secret data    |         12345
+  2 | Another secret message |         67890
+  3 | PostgreSQL TDE test    |         11111
+(3 rows)
+
+-- Update some data
+UPDATE test_encrypt SET secret_data = 'Updated secret' WHERE id = 1;
+-- Verify update worked
+SELECT * FROM test_encrypt WHERE id = 1;
+ id |  secret_data   | secret_number 
+----+----------------+---------------
+  1 | Updated secret |         12345
+(1 row)
+
+-- Test with larger data
+INSERT INTO test_encrypt (secret_data, secret_number)
+SELECT
+    repeat('Large data block ', 100),
+    generate_series
+FROM generate_series(1, 10);
+-- Count rows
+SELECT COUNT(*) FROM test_encrypt;
+ count 
+-------
+    13
+(1 row)
+
+-- Test with NULL values
+INSERT INTO test_encrypt (secret_data, secret_number) VALUES (NULL, NULL);
+SELECT * FROM test_encrypt WHERE secret_data IS NULL;
+ id | secret_data | secret_number 
+----+-------------+---------------
+ 14 |             |              
+(1 row)
+
+-- Test index creation (index pages should also be encrypted)
+CREATE INDEX ON test_encrypt (secret_number);
+-- Use the index
+SELECT secret_data FROM test_encrypt WHERE secret_number = 12345;
+  secret_data   
+----------------
+ Updated secret
+(1 row)
+
+-- Clean up
+DROP TABLE test_encrypt;
+-- =============================================================================
+-- DDL Tests: Operations that change RelFileNumber
+-- These operations create new files and write records through storage hooks,
+-- so encryption/decryption works correctly.
+-- =============================================================================
+-- -----------------------------------------------------------------------------
+-- Test 1: TRUNCATE (creates new file, writes through hooks)
+-- -----------------------------------------------------------------------------
+CREATE TABLE test_truncate (id int, data text);
+INSERT INTO test_truncate VALUES (1, 'before truncate');
+SELECT * FROM test_truncate;
+ id |      data       
+----+-----------------
+  1 | before truncate
+(1 row)
+
+TRUNCATE test_truncate;
+-- Insert new data after truncate - works fine (new file, new encryption through hooks)
+INSERT INTO test_truncate VALUES (2, 'after truncate');
+SELECT * FROM test_truncate;
+ id |      data      
+----+----------------
+  2 | after truncate
+(1 row)
+
+DROP TABLE test_truncate;
+-- -----------------------------------------------------------------------------
+-- Test 2: CLUSTER (rewrites table through hooks)
+-- -----------------------------------------------------------------------------
+CREATE TABLE test_cluster (id int PRIMARY KEY, data text);
+INSERT INTO test_cluster SELECT g, 'data ' || g FROM generate_series(1, 100) g;
+CHECKPOINT;
+CLUSTER test_cluster USING test_cluster_pkey;
+-- Works fine - data rewritten through storage hooks
+SELECT COUNT(*) FROM test_cluster;
+ count 
+-------
+   100
+(1 row)
+
+SELECT * FROM test_cluster WHERE id = 50;
+ id |  data   
+----+---------
+ 50 | data 50
+(1 row)
+
+DROP TABLE test_cluster;
+-- -----------------------------------------------------------------------------
+-- Test 3: VACUUM FULL (rewrites table through hooks)
+-- -----------------------------------------------------------------------------
+CREATE TABLE test_vacuum_full (id int, data text);
+INSERT INTO test_vacuum_full SELECT g, 'data ' || g FROM generate_series(1, 100) g;
+DELETE FROM test_vacuum_full WHERE id > 50;
+CHECKPOINT;
+VACUUM FULL test_vacuum_full;
+-- Works fine - data rewritten through storage hooks
+SELECT COUNT(*) FROM test_vacuum_full;
+ count 
+-------
+    50
+(1 row)
+
+DROP TABLE test_vacuum_full;
+-- -----------------------------------------------------------------------------
+-- Test 4: REINDEX (rebuilds index through hooks)
+-- -----------------------------------------------------------------------------
+CREATE TABLE test_reindex (id int PRIMARY KEY, data text);
+INSERT INTO test_reindex SELECT g, 'data ' || g FROM generate_series(1, 100) g;
+CHECKPOINT;
+REINDEX INDEX test_reindex_pkey;
+-- Works fine - index rebuilt through storage hooks
+SET enable_seqscan = off;
+SELECT * FROM test_reindex WHERE id = 50;
+ id |  data   
+----+---------
+ 50 | data 50
+(1 row)
+
+RESET enable_seqscan;
+DROP TABLE test_reindex;
+-- =============================================================================
+-- Additional DDL Tests: Operations that change RelFileNumber or copy files
+-- These also go through storage hooks, so encryption/decryption works correctly.
+-- =============================================================================
+-- -----------------------------------------------------------------------------
+-- Test 5: ALTER TABLE SET TABLESPACE
+-- RelFileNumber changes, but data is copied through storage hooks
+-- -----------------------------------------------------------------------------
+\! mkdir -p /tmp/test_tde_tablespace
+CREATE TABLESPACE test_tde_tblspc LOCATION '/tmp/test_tde_tablespace';
+CREATE TABLE test_set_tablespace (id int, data text);
+INSERT INTO test_set_tablespace SELECT g, 'data ' || g FROM generate_series(1, 50) g;
+CHECKPOINT;
+-- Move to different tablespace - data copied through storage hooks
+ALTER TABLE test_set_tablespace SET TABLESPACE test_tde_tblspc;
+-- Works fine - data was re-encrypted with new RelFileNumber
+SELECT COUNT(*) FROM test_set_tablespace;
+ count 
+-------
+    50
+(1 row)
+
+DROP TABLE test_set_tablespace;
+DROP TABLESPACE test_tde_tblspc;
+\! rm -rf /tmp/test_tde_tablespace
diff --git a/contrib/test_tde/sql/basic.sql b/contrib/test_tde/sql/basic.sql
new file mode 100644
index 00000000000..9b2651afee8
--- /dev/null
+++ b/contrib/test_tde/sql/basic.sql
@@ -0,0 +1,146 @@
+-- Basic test for test_tde extension
+-- Verify that encryption/decryption works correctly
+
+-- Show current settings
+SHOW test_tde.key;
+
+-- Create a test table
+CREATE TABLE test_encrypt (
+    id serial PRIMARY KEY,
+    secret_data text,
+    secret_number integer
+);
+
+-- Insert some data
+INSERT INTO test_encrypt (secret_data, secret_number) VALUES
+    ('This is secret data', 12345),
+    ('Another secret message', 67890),
+    ('PostgreSQL TDE test', 11111);
+
+-- Force a checkpoint to ensure data is written to disk
+CHECKPOINT;
+
+-- Read data back - should be decrypted correctly
+SELECT * FROM test_encrypt ORDER BY id;
+
+-- Update some data
+UPDATE test_encrypt SET secret_data = 'Updated secret' WHERE id = 1;
+
+-- Verify update worked
+SELECT * FROM test_encrypt WHERE id = 1;
+
+-- Test with larger data
+INSERT INTO test_encrypt (secret_data, secret_number)
+SELECT
+    repeat('Large data block ', 100),
+    generate_series
+FROM generate_series(1, 10);
+
+-- Count rows
+SELECT COUNT(*) FROM test_encrypt;
+
+-- Test with NULL values
+INSERT INTO test_encrypt (secret_data, secret_number) VALUES (NULL, NULL);
+SELECT * FROM test_encrypt WHERE secret_data IS NULL;
+
+-- Test index creation (index pages should also be encrypted)
+CREATE INDEX ON test_encrypt (secret_number);
+
+-- Use the index
+SELECT secret_data FROM test_encrypt WHERE secret_number = 12345;
+
+-- Clean up
+DROP TABLE test_encrypt;
+
+-- =============================================================================
+-- DDL Tests: Operations that change RelFileNumber
+-- These operations create new files and write records through storage hooks,
+-- so encryption/decryption works correctly.
+-- =============================================================================
+
+-- -----------------------------------------------------------------------------
+-- Test 1: TRUNCATE (creates new file, writes through hooks)
+-- -----------------------------------------------------------------------------
+CREATE TABLE test_truncate (id int, data text);
+INSERT INTO test_truncate VALUES (1, 'before truncate');
+SELECT * FROM test_truncate;
+
+TRUNCATE test_truncate;
+
+-- Insert new data after truncate - works fine (new file, new encryption through hooks)
+INSERT INTO test_truncate VALUES (2, 'after truncate');
+SELECT * FROM test_truncate;
+
+DROP TABLE test_truncate;
+
+-- -----------------------------------------------------------------------------
+-- Test 2: CLUSTER (rewrites table through hooks)
+-- -----------------------------------------------------------------------------
+CREATE TABLE test_cluster (id int PRIMARY KEY, data text);
+INSERT INTO test_cluster SELECT g, 'data ' || g FROM generate_series(1, 100) g;
+CHECKPOINT;
+
+CLUSTER test_cluster USING test_cluster_pkey;
+
+-- Works fine - data rewritten through storage hooks
+SELECT COUNT(*) FROM test_cluster;
+SELECT * FROM test_cluster WHERE id = 50;
+
+DROP TABLE test_cluster;
+
+-- -----------------------------------------------------------------------------
+-- Test 3: VACUUM FULL (rewrites table through hooks)
+-- -----------------------------------------------------------------------------
+CREATE TABLE test_vacuum_full (id int, data text);
+INSERT INTO test_vacuum_full SELECT g, 'data ' || g FROM generate_series(1, 100) g;
+DELETE FROM test_vacuum_full WHERE id > 50;
+CHECKPOINT;
+
+VACUUM FULL test_vacuum_full;
+
+-- Works fine - data rewritten through storage hooks
+SELECT COUNT(*) FROM test_vacuum_full;
+
+DROP TABLE test_vacuum_full;
+
+-- -----------------------------------------------------------------------------
+-- Test 4: REINDEX (rebuilds index through hooks)
+-- -----------------------------------------------------------------------------
+CREATE TABLE test_reindex (id int PRIMARY KEY, data text);
+INSERT INTO test_reindex SELECT g, 'data ' || g FROM generate_series(1, 100) g;
+CHECKPOINT;
+
+REINDEX INDEX test_reindex_pkey;
+
+-- Works fine - index rebuilt through storage hooks
+SET enable_seqscan = off;
+SELECT * FROM test_reindex WHERE id = 50;
+RESET enable_seqscan;
+
+DROP TABLE test_reindex;
+
+-- =============================================================================
+-- Additional DDL Tests: Operations that change RelFileNumber or copy files
+-- These also go through storage hooks, so encryption/decryption works correctly.
+-- =============================================================================
+
+-- -----------------------------------------------------------------------------
+-- Test 5: ALTER TABLE SET TABLESPACE
+-- RelFileNumber changes, but data is copied through storage hooks
+-- -----------------------------------------------------------------------------
+\! mkdir -p /tmp/test_tde_tablespace
+CREATE TABLESPACE test_tde_tblspc LOCATION '/tmp/test_tde_tablespace';
+
+CREATE TABLE test_set_tablespace (id int, data text);
+INSERT INTO test_set_tablespace SELECT g, 'data ' || g FROM generate_series(1, 50) g;
+CHECKPOINT;
+
+-- Move to different tablespace - data copied through storage hooks
+ALTER TABLE test_set_tablespace SET TABLESPACE test_tde_tblspc;
+
+-- Works fine - data was re-encrypted with new RelFileNumber
+SELECT COUNT(*) FROM test_set_tablespace;
+
+DROP TABLE test_set_tablespace;
+DROP TABLESPACE test_tde_tblspc;
+\! rm -rf /tmp/test_tde_tablespace
diff --git a/contrib/test_tde/test_tde.c b/contrib/test_tde/test_tde.c
new file mode 100644
index 00000000000..f70359f1c26
--- /dev/null
+++ b/contrib/test_tde/test_tde.c
@@ -0,0 +1,1131 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_tde.c
+ *		Reference implementation for Storage I/O Transform Hooks
+ *
+ * WARNING: This is for TESTING ONLY. Do not use in production.
+ *	- Key stored in plaintext GUC
+ *	- No key rotation
+ *	- Minimal error handling
+ *	- Not audited for security
+ *
+ * For production TDE, use a dedicated extension project.
+ *
+ * This extension demonstrates how to use the storage I/O transform hooks
+ * for transparent data encryption. It uses AES-256-CTR for encryption
+ * with IV derived from page metadata and block location.
+ *
+ * Author: Henson Choi <[email protected]>
+ *
+ * Copyright (c) 2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/test_tde/test_tde.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <openssl/err.h>
+#include <openssl/evp.h>
+#include <string.h>
+
+#include "access/transam.h"
+#include "access/xlog_internal.h"
+#include "access/xloginsert.h"
+#include "access/xlogreader.h"
+#include "access/xlogrecord.h"
+#include "catalog/pg_tablespace_d.h"
+#include "fmgr.h"
+#include "port/pg_crc32c.h"
+#include "access/xlog.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
+#include "storage/md.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+
+PG_MODULE_MAGIC_EXT(
+					.name = "test_tde",
+					.version = PG_VERSION
+);
+
+/* ----------
+ * GUC variables
+ * ----------
+ */
+static char *test_tde_key_hex = NULL;	/* 64 hex chars = 256 bits */
+
+/* ----------
+ * Module state
+ * ----------
+ */
+
+/*
+ * Memory context for encryption buffers.
+ * Allows allocation in critical sections (for WAL encryption).
+ */
+static MemoryContext test_tde_cxt = NULL;
+
+/*
+ * Transform ID for this extension.
+ * Value 1 means page is encrypted with test_tde.
+ * Value 0 means page is not transformed (plaintext).
+ */
+#define TEST_TDE_TRANSFORM_ID	1
+
+/*
+ * Dynamic buffers for encrypted pages.
+ * Grows as needed, freed in _PG_fini.
+ */
+static char *encrypt_buffer = NULL;
+static const void **encrypt_buffer_ptrs = NULL;
+static BlockNumber encrypt_buffer_nblocks = 0;
+
+/*
+ * WAL encryption buffer - allocated from test_tde_cxt which allows
+ * allocation in critical sections via MemoryContextAllowInCriticalSection().
+ */
+static char *wal_encrypt_buffer = NULL;
+static Size wal_encrypt_buffer_size = 0;
+
+/*
+ * WAL decryption buffer - static, only needed for records within a single page.
+ * When inplace_allowed=false, record doesn't cross page boundary, so max size
+ * is XLOG_BLCKSZ.
+ */
+static char wal_decrypt_buffer[XLOG_BLCKSZ];
+
+/*
+ * Pre-allocated OpenSSL cipher context.
+ * Created in _PG_init() and reused for all encrypt/decrypt operations.
+ * This avoids memory allocation in critical sections.
+ */
+static EVP_CIPHER_CTX *cipher_ctx = NULL;
+
+/*
+ * Transformed WAL record structure (using XLR_BLOCK_ID_TRANSFORMED from xlogrecord.h):
+ *   [XLogRecord header]
+ *   [block_id=251 (1B)]
+ *   [payload_length (4B)]
+ *   [IV (16B)]
+ *   [encrypted payload]
+ *
+ * The block ID 251 marks this record as transformed. After decryption,
+ * the marker, length, and IV are removed, restoring the original structure.
+ * If decryption is not performed, the unknown block ID causes parse failure.
+ *
+ * Note: The 21-byte overhead may temporarily cause xl_tot_len to exceed
+ * XLogRecordMaxSize after encryption. This is safe because:
+ * - XLogRecordMaxSize is only checked in XLogRecordAssemble() before our hook
+ * - XLogInsertRecord() does not re-validate the size
+ * - The decode hook removes the overhead before WAL parsing, restoring the
+ *   original size which was already validated
+ */
+#define WAL_ENCRYPT_IV_SIZE			16
+#define WAL_ENCRYPT_OVERHEAD		(SizeOfXLogRecordDataHeaderLong + WAL_ENCRYPT_IV_SIZE)
+#define WAL_CRC_SIZE			sizeof(pg_crc32c)	/* 4 bytes */
+#define WAL_IV_RANDOM_SIZE		(WAL_ENCRYPT_IV_SIZE - WAL_CRC_SIZE)	/* 12 bytes */
+
+/* Static XLogRecData for returning encrypted WAL */
+static XLogRecData wal_rdata_head;
+
+/* Previous hook values (for chaining) */
+static mdread_post_hook_type prev_mdread_post_hook = NULL;
+static mdwrite_pre_hook_type prev_mdwrite_pre_hook = NULL;
+static mdextend_pre_hook_type prev_mdextend_pre_hook = NULL;
+static xlog_insert_pre_hook_type prev_xlog_insert_pre_hook = NULL;
+static xlog_decode_pre_hook_type prev_xlog_decode_pre_hook = NULL;
+
+/* ----------
+ * Function declarations
+ * ----------
+ */
+
+/* Module entry points */
+void		_PG_init(void);
+void		_PG_fini(void);
+
+/* GUC callbacks */
+static bool check_test_tde_key(char **newval, void **extra, GucSource source);
+
+/* Hook functions */
+static void test_tde_mdread_post(RelFileLocator *rlocator, ForkNumber forknum,
+								 BlockNumber blocknum, void **buffers,
+								 BlockNumber nblocks);
+static const void **test_tde_mdwrite_pre(RelFileLocator *rlocator,
+										 ForkNumber forknum,
+										 BlockNumber blocknum,
+										 const void **buffers,
+										 BlockNumber nblocks);
+static const void *test_tde_mdextend_pre(RelFileLocator *rlocator,
+										 ForkNumber forknum,
+										 BlockNumber blocknum,
+										 const void *buffer);
+static struct XLogRecData *test_tde_xlog_insert_pre(struct XLogRecData *rdata);
+static XLogRecord *test_tde_xlog_decode_pre(XLogReaderState *state,
+											XLogRecord *record,
+											XLogRecPtr lsn,
+											bool inplace_allowed);
+
+/* Internal helper functions */
+static void ensure_encrypt_buffer(BlockNumber nblocks);
+static bool parse_hex_key(const char *hex, unsigned char *out, int outlen);
+static void derive_iv(unsigned char *iv, RelFileLocator *rlocator,
+					  BlockNumber blocknum, XLogRecPtr lsn);
+static void transform_data(const unsigned char *in, unsigned char *out,
+						   int len, const unsigned char *iv);
+static bool should_transform(RelFileLocator *rlocator, ForkNumber forknum);
+
+
+/* ----------
+ * Internal helper functions
+ * ----------
+ */
+
+/*
+ * Parse hex string to bytes
+ */
+static bool
+parse_hex_key(const char *hex, unsigned char *out, int outlen)
+{
+	int			i;
+	int			hexlen;
+
+	if (hex == NULL)
+		return false;
+
+	hexlen = strlen(hex);
+	if (hexlen != outlen * 2)
+		return false;
+
+	for (i = 0; i < outlen; i++)
+	{
+		int			hi,
+					lo;
+		char		c;
+
+		c = hex[i * 2];
+		if (c >= '0' && c <= '9')
+			hi = c - '0';
+		else if (c >= 'a' && c <= 'f')
+			hi = c - 'a' + 10;
+		else if (c >= 'A' && c <= 'F')
+			hi = c - 'A' + 10;
+		else
+			return false;
+
+		c = hex[i * 2 + 1];
+		if (c >= '0' && c <= '9')
+			lo = c - '0';
+		else if (c >= 'a' && c <= 'f')
+			lo = c - 'a' + 10;
+		else if (c >= 'A' && c <= 'F')
+			lo = c - 'A' + 10;
+		else
+			return false;
+
+		out[i] = (hi << 4) | lo;
+	}
+
+	return true;
+}
+
+/*
+ * Ensure encrypt buffer can hold 'nblocks' pages.
+ * Grows by 2x when needed. Uses test_tde_cxt for persistence.
+ */
+static void
+ensure_encrypt_buffer(BlockNumber nblocks)
+{
+	if (encrypt_buffer_nblocks >= nblocks)
+		return;
+
+	if (encrypt_buffer == NULL)
+	{
+		BlockNumber initial = Max(8, nblocks);
+		Size		size = (Size) initial * BLCKSZ;
+
+		encrypt_buffer = MemoryContextAllocAligned(test_tde_cxt, size,
+												   PG_IO_ALIGN_SIZE, 0);
+		encrypt_buffer_ptrs = MemoryContextAlloc(test_tde_cxt,
+												 initial * sizeof(void *));
+		encrypt_buffer_nblocks = initial;
+	}
+	else
+	{
+		BlockNumber new_nblocks = encrypt_buffer_nblocks;
+		Size		new_size;
+
+		while (new_nblocks < nblocks)
+			new_nblocks *= 2;
+
+		new_size = (Size) new_nblocks * BLCKSZ;
+
+		/* repalloc doesn't preserve alignment, so allocate new and copy */
+		{
+			char	   *new_buffer = MemoryContextAllocAligned(test_tde_cxt,
+															   new_size,
+															   PG_IO_ALIGN_SIZE, 0);
+
+			memcpy(new_buffer, encrypt_buffer,
+				   (Size) encrypt_buffer_nblocks * BLCKSZ);
+			pfree(encrypt_buffer);
+			encrypt_buffer = new_buffer;
+		}
+
+		encrypt_buffer_ptrs = repalloc(encrypt_buffer_ptrs,
+									   new_nblocks * sizeof(void *));
+		encrypt_buffer_nblocks = new_nblocks;
+	}
+
+	/* Update pointers array */
+	for (BlockNumber i = 0; i < encrypt_buffer_nblocks; i++)
+		encrypt_buffer_ptrs[i] = encrypt_buffer + (Size) i * BLCKSZ;
+}
+
+
+/*
+ * Derive IV from page location and header
+ *
+ * IV structure (16 bytes) - simple, deterministic layout:
+ *
+ * AES-CTR mode only requires IV uniqueness, not randomness.
+ * The combination of LSN + RelFileNumber + BlockNumber guarantees uniqueness:
+ *   - LSN: Globally unique across entire WAL stream
+ *   - RelFileNumber: Unique within database
+ *   - BlockNumber: Unique within relation
+ *
+ * Even when a single WAL record modifies multiple pages (e.g., B-tree split),
+ * the BlockNumber distinguishes each page.
+ *
+ * Layout (high entropy bytes first, low entropy bytes last for CTR counter space):
+ *   [0-3]   LSN low 32 bits - changes frequently (high entropy)
+ *   [4-5]   LSN bits 32-47 - mid entropy
+ *   [6-8]   BlockNumber low 24 bits
+ *   [9-11]  RelFileNumber low 24 bits
+ *   [12]    BlockNumber high 8 bits - usually 0 for small tables
+ *   [13]    RelFileNumber high 8 bits - usually 0
+ *   [14-15] LSN bits 48-63 - usually 0, counter space for CTR
+ *
+ * CTR counter space analysis:
+ *   - Page size: 8KB, encrypted area: 8168 bytes (excluding 24-byte header)
+ *   - AES block size: 16 bytes
+ *   - Counter increments per page: 8168/16 = 511 (0x1FF)
+ *   - Counter affects only IV[14-15] (max increment 0x1FF < 0x10000)
+ *   - Bytes 12-15 provide 2^32 counter space, far exceeding 511 needed
+ *   - Collision requires same IV[0-11], which means same LSN+BlockNum+RelNum
+ *
+ * Note: spcOid, dbOid not used - RelFileNumber is sufficient for uniqueness.
+ *
+ * Known limitation: Operations that copy/move files while changing
+ * RelFileNumber without going through storage hooks cause decryption failure.
+ */
+static void
+derive_iv(unsigned char *iv, RelFileLocator *rlocator,
+		  BlockNumber blocknum, XLogRecPtr lsn)
+{
+
+	/*
+	 * Layout: High entropy first, low entropy (usually 0) last.
+	 * [LSN low 4B][LSN mid 2B][BlockNum low 3B][RelNum low 3B]
+	 * [BlockNum high 1B][RelNum high 1B][LSN high 2B]
+	 */
+
+	/* LSN low 32 bits - bytes 0-3 (high entropy, changes frequently) */
+	iv[0] = (uint8) ((lsn >> 0) & 0xFF);
+	iv[1] = (uint8) ((lsn >> 8) & 0xFF);
+	iv[2] = (uint8) ((lsn >> 16) & 0xFF);
+	iv[3] = (uint8) ((lsn >> 24) & 0xFF);
+
+	/* LSN bits 32-47 - bytes 4-5 (mid entropy) */
+	iv[4] = (uint8) ((lsn >> 32) & 0xFF);
+	iv[5] = (uint8) ((lsn >> 40) & 0xFF);
+
+	/* BlockNumber low 24 bits - bytes 6-8 */
+	iv[6] = (uint8) ((blocknum >> 0) & 0xFF);
+	iv[7] = (uint8) ((blocknum >> 8) & 0xFF);
+	iv[8] = (uint8) ((blocknum >> 16) & 0xFF);
+
+	/* RelFileNumber low 24 bits - bytes 9-11 */
+	iv[9] = (uint8) ((rlocator->relNumber >> 0) & 0xFF);
+	iv[10] = (uint8) ((rlocator->relNumber >> 8) & 0xFF);
+	iv[11] = (uint8) ((rlocator->relNumber >> 16) & 0xFF);
+
+	/* BlockNumber high 8 bits - byte 12 (usually 0 for small tables) */
+	iv[12] = (uint8) ((blocknum >> 24) & 0xFF);
+
+	/* RelFileNumber high 8 bits - byte 13 (usually 0) */
+	iv[13] = (uint8) ((rlocator->relNumber >> 24) & 0xFF);
+
+	/* LSN bits 48-63 - bytes 14-15 (usually 0, counter space for CTR) */
+	iv[14] = (uint8) ((lsn >> 48) & 0xFF);
+	iv[15] = (uint8) ((lsn >> 56) & 0xFF);
+}
+
+/*
+ * Encrypt or decrypt data using AES-256-CTR
+ *
+ * AES-CTR is symmetric: encrypt and decrypt use the same operation.
+ */
+static void
+transform_data(const unsigned char *in, unsigned char *out, int len,
+			   const unsigned char *iv)
+{
+	int			outlen,
+				tmplen;
+
+	if (len <= 0)
+		return;
+
+	/*
+	 * cipher_ctx is pre-allocated and initialized with cipher/key in _PG_init().
+	 * Here we only set IV (cipher=NULL, key=NULL), which avoids internal
+	 * memory allocation. This is critical for WAL encryption which runs
+	 * inside critical sections. We use PANIC for all errors.
+	 */
+	if (cipher_ctx == NULL)
+		ereport(PANIC,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("test_tde: cipher context not initialized")));
+
+	if (EVP_EncryptInit_ex(cipher_ctx, NULL, NULL, NULL, iv) != 1)
+		ereport(PANIC,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("test_tde: EVP_EncryptInit_ex failed: %s",
+						ERR_error_string(ERR_get_error(), NULL))));
+
+	if (EVP_EncryptUpdate(cipher_ctx, out, &outlen, in, len) != 1)
+		ereport(PANIC,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("test_tde: EVP_EncryptUpdate failed: %s",
+						ERR_error_string(ERR_get_error(), NULL))));
+
+	if (EVP_EncryptFinal_ex(cipher_ctx, out + outlen, &tmplen) != 1)
+		ereport(PANIC,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("test_tde: EVP_EncryptFinal_ex failed: %s",
+						ERR_error_string(ERR_get_error(), NULL))));
+}
+
+/*
+ * Check if we should encrypt/decrypt this relation
+ *
+ * For this test implementation, we encrypt only user-created relations.
+ * A production implementation would check encryption policies.
+ */
+static bool
+should_transform(RelFileLocator *rlocator, ForkNumber forknum)
+{
+	/* Skip if cipher not initialized (key not configured) */
+	if (cipher_ctx == NULL)
+		return false;
+
+	/* Skip system catalog tablespace (pg_global) */
+	if (rlocator->spcOid == GLOBALTABLESPACE_OID)
+		return false;
+
+	/*
+	 * Skip system catalogs (OID < FirstNormalObjectId). This ensures we don't
+	 * try to encrypt/decrypt pre-existing system catalog pages that were
+	 * created without encryption.
+	 */
+	if (rlocator->relNumber < FirstNormalObjectId)
+		return false;
+
+	(void) forknum;				/* all forks are encrypted for user tables */
+
+	return true;
+}
+
+
+/* ----------
+ * Hook functions - Page I/O
+ * ----------
+ */
+
+/*
+ * Post-read hook: decrypt blocks after reading from disk
+ */
+static void
+test_tde_mdread_post(RelFileLocator *rlocator, ForkNumber forknum,
+					 BlockNumber blocknum, void **buffers,
+					 BlockNumber nblocks)
+{
+	BlockNumber i;
+	unsigned char iv[16];
+
+	/* Chain to previous hook if any */
+	if (prev_mdread_post_hook)
+		prev_mdread_post_hook(rlocator, forknum, blocknum, buffers, nblocks);
+
+	for (i = 0; i < nblocks; i++)
+	{
+		PageHeader	phdr = (PageHeader) buffers[i];
+		uint16		checksum;
+		uint8		transform_id;
+
+		/* Skip empty/new pages */
+		if (PageIsNew((Page) buffers[i]))
+			continue;
+
+		/* Skip if page doesn't look valid */
+		if (phdr->pd_lower < SizeOfPageHeaderData ||
+			phdr->pd_lower > phdr->pd_upper ||
+			phdr->pd_upper > phdr->pd_special ||
+			phdr->pd_special > BLCKSZ)
+			continue;
+
+		/* Check transform ID - skip if page is not encrypted by us */
+		transform_id = PageGetTransformId((Page) buffers[i]);
+		if (transform_id == PD_TRANSFORM_NONE)
+			continue;	/* Page is not encrypted */
+
+		if (transform_id != TEST_TDE_TRANSFORM_ID)
+		{
+			elog(DEBUG1, "test_tde: skipping block %u with transform ID %u (not ours)",
+				 blocknum + i, transform_id);
+			continue;
+		}
+
+		/* Page is encrypted but cipher not initialized - fatal error */
+		if (cipher_ctx == NULL)
+			ereport(PANIC,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					 errmsg("test_tde: encrypted page found but encryption key not configured"),
+					 errdetail("Block %u of relation %u/%u/%u fork %d has transform ID %u.",
+							   blocknum + i, rlocator->spcOid, rlocator->dbOid,
+							   rlocator->relNumber, forknum, transform_id)));
+
+		/* Verify checksum on encrypted data before decryption */
+		if (DataChecksumsEnabled())
+		{
+			checksum = pg_checksum_page((char *) buffers[i], blocknum + i);
+			if (checksum != phdr->pd_checksum)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg("page verification failed, calculated checksum %u but expected %u",
+								checksum, phdr->pd_checksum)));
+			}
+		}
+
+		/* Derive IV using LSN from page header */
+		derive_iv(iv, rlocator, blocknum + i, PageGetLSN((Page) buffers[i]));
+
+		/* Decrypt data area in place (header stays unchanged) */
+		transform_data((unsigned char *) buffers[i] + SizeOfPageHeaderData,
+					   (unsigned char *) buffers[i] + SizeOfPageHeaderData,
+					   BLCKSZ - SizeOfPageHeaderData, iv);
+
+		/* Clear transform ID and recalculate checksum for plaintext data */
+		PageSetTransformId((Page) buffers[i], PD_TRANSFORM_NONE);
+		PageSetChecksumInplace((Page) buffers[i], blocknum + i);
+	}
+}
+
+/*
+ * Helper: encrypt a single page into the encrypt_buffer at given offset.
+ * Returns pointer to encrypted page, or original buffer if page was skipped.
+ */
+static const void *
+encrypt_page(RelFileLocator *rlocator, BlockNumber blocknum,
+			 const void *buffer, Size buffer_offset)
+{
+	unsigned char iv[16];
+	PageHeader	phdr = (PageHeader) buffer;
+	char	   *dest = encrypt_buffer + buffer_offset;
+
+	/* Skip empty/new pages */
+	if (PageIsNew((Page) buffer))
+		return buffer;
+
+	/* Skip if page doesn't look valid */
+	if (phdr->pd_lower < SizeOfPageHeaderData ||
+		phdr->pd_lower > phdr->pd_upper ||
+		phdr->pd_upper > phdr->pd_special ||
+		phdr->pd_special > BLCKSZ)
+		return buffer;
+
+	/* Derive IV using LSN from page header */
+	derive_iv(iv, rlocator, blocknum, PageGetLSN((Page) buffer));
+
+	/* Copy header, encrypt data area */
+	memcpy(dest, buffer, SizeOfPageHeaderData);
+	transform_data((unsigned char *) buffer + SizeOfPageHeaderData,
+				   (unsigned char *) dest + SizeOfPageHeaderData,
+				   BLCKSZ - SizeOfPageHeaderData, iv);
+
+	/* Set transform ID to mark page as encrypted */
+	PageSetTransformId((Page) dest, TEST_TDE_TRANSFORM_ID);
+
+	/* Recalculate checksum for encrypted data */
+	PageSetChecksumInplace((Page) dest, blocknum);
+
+	return dest;
+}
+
+/*
+ * Pre-write hook: encrypt blocks before writing to disk
+ */
+static const void **
+test_tde_mdwrite_pre(RelFileLocator *rlocator, ForkNumber forknum,
+					 BlockNumber blocknum, const void **buffers,
+					 BlockNumber nblocks)
+{
+	BlockNumber i;
+
+	/* Chain to previous hook if any */
+	if (prev_mdwrite_pre_hook)
+		buffers = prev_mdwrite_pre_hook(rlocator, forknum, blocknum, buffers, nblocks);
+
+	if (!should_transform(rlocator, forknum))
+		return buffers;
+
+	/* Ensure buffer is large enough */
+	ensure_encrypt_buffer(nblocks);
+
+	for (i = 0; i < nblocks; i++)
+		encrypt_buffer_ptrs[i] = encrypt_page(rlocator, blocknum + i,
+											  buffers[i], (Size) i * BLCKSZ);
+
+	return encrypt_buffer_ptrs;
+}
+
+/*
+ * Pre-extend hook: encrypt block before extending relation
+ */
+static const void *
+test_tde_mdextend_pre(RelFileLocator *rlocator, ForkNumber forknum,
+					  BlockNumber blocknum, const void *buffer)
+{
+	/* Chain to previous hook if any */
+	if (prev_mdextend_pre_hook)
+		buffer = prev_mdextend_pre_hook(rlocator, forknum, blocknum, buffer);
+
+	if (!should_transform(rlocator, forknum))
+		return buffer;
+
+	/* Ensure buffer is large enough for at least 1 block */
+	ensure_encrypt_buffer(1);
+
+	return encrypt_page(rlocator, blocknum, buffer, 0);
+}
+
+
+/* ----------
+ * Hook functions - WAL I/O
+ * ----------
+ */
+
+/*
+ * Ensure WAL encryption buffer is large enough.
+ * Uses test_tde_cxt which allows allocation in critical sections.
+ */
+static void
+ensure_wal_encrypt_buffer(Size needed)
+{
+	if (wal_encrypt_buffer_size >= needed)
+		return;
+
+	if (wal_encrypt_buffer == NULL)
+		wal_encrypt_buffer = MemoryContextAlloc(test_tde_cxt, needed);
+	else
+		wal_encrypt_buffer = repalloc(wal_encrypt_buffer, needed);
+	wal_encrypt_buffer_size = needed;
+}
+
+/*
+ * WAL insert pre-hook: encrypt WAL record data
+ *
+ * Strategy:
+ * 1. Copy XLogRecord header and payload
+ * 2. Save plaintext CRC from header (xl_crc contains payload CRC at this point)
+ * 3. Build IV: [plaintext CRC (4B)] [random (12B)]
+ * 4. Insert transformation header (block ID 251 + payload_length) and IV
+ * 5. Encrypt original payload with the IV
+ * 6. Update xl_tot_len and recalculate CRC for encrypted payload
+ *
+ * Resulting record structure:
+ *   [XLogRecord header]
+ *   [block_id=251 (1B)]
+ *   [payload_length (4B)]
+ *   [IV 16B]
+ *   [encrypted payload]
+ *
+ * The block ID 251 marks this record as encrypted. After decryption,
+ * the marker, length, and IV are removed, restoring the original structure.
+ * If decryption is not performed, the unknown block ID causes parse failure.
+ */
+static struct XLogRecData *
+test_tde_xlog_insert_pre(struct XLogRecData *rdata)
+{
+	XLogRecData *node;
+	XLogRecord *rechdr;
+	char	   *bufptr;
+	char	   *new_payload_start;
+	uint32		orig_total_len;
+	uint32		orig_payload_len;
+	uint32		new_total_len;
+	uint32		transform_payload_len;
+	unsigned char iv[WAL_ENCRYPT_IV_SIZE];
+	pg_crc32c	plaintext_crc;
+
+	/* Chain to previous hook if any */
+	if (prev_xlog_insert_pre_hook)
+		rdata = prev_xlog_insert_pre_hook(rdata);
+
+	/* Skip if cipher not initialized (key not configured) */
+	if (cipher_ctx == NULL)
+		return rdata;
+
+	/* First node must contain XLogRecord header */
+	if (rdata == NULL || rdata->data == NULL || rdata->len < SizeOfXLogRecord)
+		return rdata;
+
+	rechdr = (XLogRecord *) rdata->data;
+	orig_total_len = rechdr->xl_tot_len;
+	orig_payload_len = orig_total_len - SizeOfXLogRecord;
+
+	/* Sanity check */
+	if (orig_total_len < SizeOfXLogRecord)
+		return rdata;
+
+	/*
+	 * Skip records with no payload (e.g., XLOG_SWITCH). These are header-only
+	 * records where adding encryption overhead would break size assertions.
+	 */
+	if (orig_payload_len == 0)
+		return rdata;
+
+	new_total_len = orig_total_len + WAL_ENCRYPT_OVERHEAD;
+
+	/*
+	 * Save plaintext CRC before we modify anything.
+	 * At this point, xl_crc contains the CRC of the payload only
+	 * (header CRC is added later by XLogInsertRecord).
+	 */
+	plaintext_crc = rechdr->xl_crc;
+
+	/*
+	 * Ensure buffer is large enough. test_tde_cxt allows allocation in
+	 * critical sections, so this is safe even during WAL insertion.
+	 * OOM here will cause PANIC, which is acceptable for critical sections.
+	 */
+	ensure_wal_encrypt_buffer(new_total_len);
+
+	/*
+	 * Build IV: [plaintext CRC (4B)] [random (12B)]
+	 * Store CRC directly in IV[0..3] (little-endian).
+	 */
+	iv[0] = ((uint32) plaintext_crc >> 0) & 0xFF;
+	iv[1] = ((uint32) plaintext_crc >> 8) & 0xFF;
+	iv[2] = ((uint32) plaintext_crc >> 16) & 0xFF;
+	iv[3] = ((uint32) plaintext_crc >> 24) & 0xFF;
+
+	/* Generate random bytes for IV[4..15] (12 bytes) for uniqueness */
+	if (!pg_strong_random(iv + WAL_CRC_SIZE, WAL_IV_RANDOM_SIZE))
+	{
+		ereport(WARNING,
+				(errmsg("test_tde: failed to generate random IV for WAL")));
+		return rdata;
+	}
+
+	/*
+	 * Build encrypted record in buffer:
+	 * [header][block_id][payload_length][IV][encrypted_payload]
+	 */
+	bufptr = wal_encrypt_buffer;
+
+	/* 1. Copy header from first rdata node */
+	memcpy(bufptr, rdata->data, SizeOfXLogRecord);
+	bufptr += SizeOfXLogRecord;
+
+	/* 2. Insert transformation header (block ID 251 + payload_length) */
+	new_payload_start = bufptr;
+	*bufptr = (char) XLR_BLOCK_ID_TRANSFORMED;
+	bufptr += sizeof(uint8);
+
+	/* Calculate payload_length: IV + encrypted payload */
+	transform_payload_len = WAL_ENCRYPT_IV_SIZE + orig_payload_len;
+
+	/* Store payload_length (4 bytes, unaligned, little-endian) */
+	bufptr[0] = (char) ((transform_payload_len >> 0) & 0xFF);
+	bufptr[1] = (char) ((transform_payload_len >> 8) & 0xFF);
+	bufptr[2] = (char) ((transform_payload_len >> 16) & 0xFF);
+	bufptr[3] = (char) ((transform_payload_len >> 24) & 0xFF);
+	bufptr += sizeof(uint32);
+
+	/* 3. Insert IV (CRC in first 4 bytes, random in remaining 12) */
+	memcpy(bufptr, iv, WAL_ENCRYPT_IV_SIZE);
+	bufptr += WAL_ENCRYPT_IV_SIZE;
+
+	/* 4. Copy payload to buffer, then encrypt in-place */
+	if (orig_payload_len > 0)
+	{
+		Size		first_node_payload;
+		char	   *encrypt_start = bufptr;
+
+		/* First node: skip header, copy remaining payload */
+		first_node_payload = rdata->len - SizeOfXLogRecord;
+		if (first_node_payload > 0)
+		{
+			memcpy(bufptr, (char *) rdata->data + SizeOfXLogRecord, first_node_payload);
+			bufptr += first_node_payload;
+		}
+
+		/* Remaining nodes: copy all data */
+		for (node = rdata->next; node != NULL; node = node->next)
+		{
+			if (node->len > 0 && node->data != NULL)
+			{
+				memcpy(bufptr, node->data, node->len);
+				bufptr += node->len;
+			}
+		}
+
+		/* Encrypt payload in-place */
+		transform_data((unsigned char *) encrypt_start,
+					   (unsigned char *) encrypt_start,
+					   orig_payload_len, iv);
+	}
+
+	/* Update header with new total length */
+	rechdr = (XLogRecord *) wal_encrypt_buffer;
+	rechdr->xl_tot_len = new_total_len;
+
+	/*
+	 * Recalculate CRC for the new payload (marker + length + IV + encrypted data).
+	 * The header CRC will be added by XLogInsertRecord later.
+	 */
+	{
+		pg_crc32c	crc;
+
+		INIT_CRC32C(crc);
+		COMP_CRC32C(crc, new_payload_start, new_total_len - SizeOfXLogRecord);
+		rechdr->xl_crc = crc;
+	}
+
+	/* Return single XLogRecData pointing to our encrypted buffer */
+	wal_rdata_head.next = NULL;
+	wal_rdata_head.data = wal_encrypt_buffer;
+	wal_rdata_head.len = new_total_len;
+
+	return &wal_rdata_head;
+}
+
+/*
+ * WAL decode pre-hook: decrypt WAL record data
+ *
+ * This reverses the encryption done in xlog_insert_pre_hook.
+ * Checks for block ID 251 marker to identify encrypted records.
+ *
+ * Input:  [header] [block_id=251 (1B)] [payload_length (4B)] [IV 16B] [encrypted payload]
+ * Output: [header] [original payload] (shorter by 21 bytes)
+ *
+ * Recovery process:
+ * 1. Check for encryption marker (block ID 251)
+ * 2. Read payload_length from transform header
+ * 3. Extract IV for decryption
+ * 4. Decrypt payload using IV
+ * 5. Extract plaintext payload CRC from IV[0..3]
+ * 6. Restore original record structure
+ *
+ * If the marker is not found, record is not encrypted (pass through).
+ * If inplace_allowed, decrypts in place. Otherwise, copies to static buffer.
+ */
+static XLogRecord *
+test_tde_xlog_decode_pre(XLogReaderState *state, XLogRecord *record,
+						 XLogRecPtr lsn, bool inplace_allowed)
+{
+	uint32		total_len;
+	uint32		transform_payload_len;
+	uint32		encrypted_payload_len;
+	unsigned char iv[WAL_ENCRYPT_IV_SIZE];
+	char	   *payload_start;
+	char	   *len_ptr;
+	XLogRecord *work_record;
+
+	/* Chain to previous hook if any */
+	if (prev_xlog_decode_pre_hook)
+		record = prev_xlog_decode_pre_hook(state, record, lsn, inplace_allowed);
+
+	if (record == NULL)
+		return record;
+
+	total_len = record->xl_tot_len;
+
+	/* Must have at least header + transform header + IV */
+	if (total_len < SizeOfXLogRecord + WAL_ENCRYPT_OVERHEAD)
+		return record;
+
+	/* Check for transformation marker (block ID 251) */
+	payload_start = (char *) record + SizeOfXLogRecord;
+	if ((unsigned char) *payload_start != XLR_BLOCK_ID_TRANSFORMED)
+		return record;			/* Not transformed, pass through */
+
+	/* WAL is encrypted but cipher not initialized - fatal error */
+	if (cipher_ctx == NULL)
+		ereport(PANIC,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("test_tde: encrypted WAL record found but encryption key not configured"),
+				 errdetail("WAL record at LSN %X/%X has transformation marker.",
+						   LSN_FORMAT_ARGS(lsn))));
+
+	/*
+	 * If inplace modification allowed, work directly on record. Otherwise,
+	 * copy to static buffer (record fits in single page).
+	 */
+	if (inplace_allowed)
+	{
+		work_record = record;
+	}
+	else
+	{
+		/* Record within single page, must fit in XLOG_BLCKSZ */
+		if (total_len > XLOG_BLCKSZ)
+		{
+			ereport(WARNING,
+					(errmsg("test_tde: WAL record too large for decryption buffer")));
+			return record;
+		}
+		memcpy(wal_decrypt_buffer, record, total_len);
+		work_record = (XLogRecord *) wal_decrypt_buffer;
+	}
+
+	/* Recalculate payload_start for work_record */
+	payload_start = (char *) work_record + SizeOfXLogRecord;
+
+	/* Read payload_length from transform header (4 bytes, unaligned, little-endian) */
+	len_ptr = payload_start + sizeof(uint8);
+	transform_payload_len = ((uint32) (unsigned char) len_ptr[0] << 0) |
+							((uint32) (unsigned char) len_ptr[1] << 8) |
+							((uint32) (unsigned char) len_ptr[2] << 16) |
+							((uint32) (unsigned char) len_ptr[3] << 24);
+
+	/* Validate payload_length */
+	if (transform_payload_len < WAL_ENCRYPT_IV_SIZE ||
+		transform_payload_len > total_len - SizeOfXLogRecord - SizeOfXLogRecordDataHeaderLong)
+	{
+		ereport(WARNING,
+				(errmsg("test_tde: invalid transform payload length %u at LSN %X/%X",
+						transform_payload_len, LSN_FORMAT_ARGS(lsn))));
+		return record;
+	}
+
+	/* Extract IV (after transform header) */
+	memcpy(iv, payload_start + SizeOfXLogRecordDataHeaderLong, WAL_ENCRYPT_IV_SIZE);
+
+	/* Encrypted payload length = transform_payload_len - IV */
+	encrypted_payload_len = transform_payload_len - WAL_ENCRYPT_IV_SIZE;
+
+	/*
+	 * Decrypt payload directly to payload_start position, removing header and IV.
+	 * Source: payload_start + 21 (encrypted data after transform header + IV)
+	 * Dest:   payload_start (overwrite transform header with decrypted data)
+	 */
+	if (encrypted_payload_len > 0)
+	{
+		transform_data((unsigned char *) (payload_start + WAL_ENCRYPT_OVERHEAD),
+					   (unsigned char *) payload_start,
+					   encrypted_payload_len, iv);
+	}
+
+	/* Update header with original length (transform header and IV removed) */
+	work_record->xl_tot_len = SizeOfXLogRecord + encrypted_payload_len;
+
+	/*
+	 * Recover plaintext payload CRC from IV[0..3] (little-endian).
+	 */
+	{
+		pg_crc32c	recovered_payload_crc;
+		pg_crc32c	full_crc;
+
+		/* Extract CRC directly from IV[0..3] */
+		recovered_payload_crc = (pg_crc32c) (((uint32) iv[0] << 0) |
+											 ((uint32) iv[1] << 8) |
+											 ((uint32) iv[2] << 16) |
+											 ((uint32) iv[3] << 24));
+
+		/*
+		 * For ValidXLogRecord(), we need CRC of: payload + header (up to xl_crc)
+		 * The recovered CRC is payload-only, so add header portion.
+		 */
+		full_crc = recovered_payload_crc;
+		COMP_CRC32C(full_crc, (char *) work_record, offsetof(XLogRecord, xl_crc));
+		FIN_CRC32C(full_crc);
+		work_record->xl_crc = full_crc;
+	}
+
+	return work_record;
+}
+
+
+/* ----------
+ * GUC callbacks
+ * ----------
+ */
+
+/*
+ * GUC check hook for key
+ */
+static bool
+check_test_tde_key(char **newval, void **extra, GucSource source)
+{
+	if (*newval == NULL || strlen(*newval) == 0)
+		return true;
+
+	if (strlen(*newval) != 64)
+	{
+		GUC_check_errdetail("Key must be exactly 64 hex characters (256 bits).");
+		return false;
+	}
+
+	/* Validate hex characters */
+	for (int i = 0; i < 64; i++)
+	{
+		char		c = (*newval)[i];
+
+		if (!((c >= '0' && c <= '9') ||
+			  (c >= 'a' && c <= 'f') ||
+			  (c >= 'A' && c <= 'F')))
+		{
+			GUC_check_errdetail("Key must contain only hex characters (0-9, a-f, A-F).");
+			return false;
+		}
+	}
+
+	return true;
+}
+
+/* ----------
+ * Module entry points
+ * ----------
+ */
+
+/*
+ * Module initialization
+ */
+void
+_PG_init(void)
+{
+	unsigned char key[32];
+
+	/*
+	 * Create memory context for encryption buffers and allow allocation
+	 * in critical sections. This is necessary because WAL encryption runs
+	 * inside critical sections, and OOM there will cause PANIC anyway.
+	 */
+	test_tde_cxt = AllocSetContextCreate(TopMemoryContext,
+										 "test_tde",
+										 ALLOCSET_DEFAULT_SIZES);
+	MemoryContextAllowInCriticalSection(test_tde_cxt, true);
+
+	/*
+	 * Define GUC for encryption key.
+	 *
+	 * PGC_POSTMASTER: Key can only be set at server start to prevent
+	 * accidental runtime changes.
+	 *
+	 * WARNING: Once data is encrypted with a key, that same key MUST be used
+	 * for the lifetime of the data. Changing the key (even across restarts)
+	 * will cause decryption failures and data corruption. This reference
+	 * implementation does not support key rotation.
+	 */
+	DefineCustomStringVariable("test_tde.key",
+							   "Encryption key in hex format (64 characters = 256 bits).",
+							   "WARNING: Key must never change once data is encrypted!",
+							   &test_tde_key_hex,
+							   "",
+							   PGC_POSTMASTER,
+							   GUC_SUPERUSER_ONLY,
+							   check_test_tde_key,
+							   NULL,
+							   NULL);
+
+	MarkGUCPrefixReserved("test_tde");
+
+	/*
+	 * Parse key and initialize cipher context if key is configured.
+	 * cipher_ctx remains NULL if no key is set, disabling encryption.
+	 */
+	if (test_tde_key_hex != NULL && strlen(test_tde_key_hex) == 64)
+	{
+		if (!parse_hex_key(test_tde_key_hex, key, 32))
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("test_tde: failed to parse encryption key")));
+
+		cipher_ctx = EVP_CIPHER_CTX_new();
+		if (!cipher_ctx)
+			ereport(ERROR,
+					(errcode(ERRCODE_OUT_OF_MEMORY),
+					 errmsg("test_tde: failed to create cipher context")));
+
+		if (EVP_EncryptInit_ex(cipher_ctx, EVP_aes_256_ctr(), NULL, key, NULL) != 1)
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					 errmsg("test_tde: failed to initialize cipher context")));
+
+		/* Clear key from stack */
+		explicit_bzero(key, sizeof(key));
+	}
+
+	/* Install hooks (save previous values for chaining) */
+	prev_mdread_post_hook = mdread_post_hook;
+	mdread_post_hook = test_tde_mdread_post;
+
+	prev_mdwrite_pre_hook = mdwrite_pre_hook;
+	mdwrite_pre_hook = test_tde_mdwrite_pre;
+
+	prev_mdextend_pre_hook = mdextend_pre_hook;
+	mdextend_pre_hook = test_tde_mdextend_pre;
+
+	prev_xlog_insert_pre_hook = xlog_insert_pre_hook;
+	xlog_insert_pre_hook = test_tde_xlog_insert_pre;
+
+	prev_xlog_decode_pre_hook = xlog_decode_pre_hook;
+	xlog_decode_pre_hook = test_tde_xlog_decode_pre;
+
+	ereport(LOG,
+			(errmsg("test_tde: initialized (WARNING: for testing only!)")));
+}
+
+/*
+ * Module finalization
+ */
+void
+_PG_fini(void)
+{
+	/* Restore previous hooks */
+	xlog_decode_pre_hook = prev_xlog_decode_pre_hook;
+	xlog_insert_pre_hook = prev_xlog_insert_pre_hook;
+	mdextend_pre_hook = prev_mdextend_pre_hook;
+	mdwrite_pre_hook = prev_mdwrite_pre_hook;
+	mdread_post_hook = prev_mdread_post_hook;
+
+	/* Free OpenSSL cipher context (also clears key material) */
+	if (cipher_ctx != NULL)
+	{
+		EVP_CIPHER_CTX_free(cipher_ctx);
+		cipher_ctx = NULL;
+	}
+
+	/*
+	 * Delete memory context - this frees all buffers allocated from it
+	 * (encrypt_buffer, encrypt_buffer_ptrs, wal_encrypt_buffer).
+	 */
+	if (test_tde_cxt != NULL)
+	{
+		MemoryContextDelete(test_tde_cxt);
+		test_tde_cxt = NULL;
+	}
+
+	/* Reset buffer pointers */
+	encrypt_buffer = NULL;
+	encrypt_buffer_ptrs = NULL;
+	encrypt_buffer_nblocks = 0;
+	wal_encrypt_buffer = NULL;
+	wal_encrypt_buffer_size = 0;
+}
diff --git a/contrib/test_tde/test_tde.conf b/contrib/test_tde/test_tde.conf
new file mode 100644
index 00000000000..0b00366474c
--- /dev/null
+++ b/contrib/test_tde/test_tde.conf
@@ -0,0 +1,2 @@
+shared_preload_libraries = 'test_tde'
+test_tde.key = '0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef'
-- 
2.50.1 (Apple Git-155)

Reply via email to