Re: RFC: PostgreSQL Storage I/O Transformation Hooks

Henson Choi Sun, 28 Dec 2025 17:47:02 -0800

Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks

Hi Zsolt,


Thank you for the detailed technical feedback. Let me address each point.


1. AIO Extensibility and SMGR Approach

I think the SMGR extensibility approach is equally valid. In fact, when I
realized in PG18 that buffer page reads are split between md.c (mdreadv)
and bufmgr.c (buffer_readv_complete_one), I felt some discomfort about
where to place the decryption hook. "Does this really belong in both
places?" was my first thought.

The SMGR approach could provide a cleaner, more unified integration point
for data transformation.

The main difference is timing and current availability:

- The hook approach is working today and can be used immediately
- Your SMGR extensibility work provides a more comprehensive long-term
solution

I don't see these as competing proposals. Both approaches are valid and
serve different needs. The hook infrastructure can serve as an interim
solution for organizations that need TDE now, while the community develops
the more comprehensive SMGR extensibility.

In the long term, if SMGR extensibility provides better integration points,
extensions could migrate to that approach.


2. Understanding PostgreSQL Internals

You're absolutely right that extension developers need to understand
multiprocess architecture, memory management, critical sections, and so on.

This is precisely why test_tde exists as a reference implementation. It
documents the "dance steps" with the core - showing where memory must be
pre-allocated, how to handle critical sections safely, when AIO completion
might happen in a different process, and so on.

The goal isn't to hide PostgreSQL's complexity, but to provide a working
example that shows cryptography experts exactly where and how to integrate
their algorithms within PostgreSQL's constraints.


3. Contributing to Existing Solutions vs Korean Regulations

I appreciate the suggestion about contributing to existing solutions. I
personally prefer the OpenSSL Provider approach for algorithm extensibility.

However, the reality is more complex.

Cryptography experts often have their own libraries developed over decades.
While it might look like "just encryption code" to me, I don't have the
authority to force them to adopt specific frameworks.

ARIA and SEED are already implemented in OpenSSL. However, Korean law
requires certified implementations. Specifically, companies must use
nationally-certified builds and provide the hash codes of those specific
library binaries to regulators. You cannot simply use the OpenSSL version,
even if the algorithm is identical.

This is why we need an extension mechanism rather than hardcoding specific
libraries into core. Different jurisdictions have different certification
requirements.


4. WAL vs Data File Encryption

You mentioned that EU regulations might be satisfied by encrypting only
data files. That's a valid practical consideration.

In Korea, regulations require the introduction of approved cryptographic
algorithms, but in practice most systems run AES due to lack of CPU
acceleration for ARIA/SEED. It's largely a legal compliance checkbox.

Regarding what to protect (WAL vs heap vs both), there's flexibility
depending on the organization and jurisdiction. The hook approach allows
extensions to choose - you can implement only the buffer hooks if that
satisfies your requirements, or add WAL hooks if needed.


5. Fork Files vs Page Header for Metadata

You asked whether custom WAL records about encryption events could solve
the crash recovery problem with fork files.

That's a reasonable approach for SMGR-based solutions where you control the
storage layer. However, with the hook approach, we don't have the ability
to inject custom WAL records for encryption events.

Currently, in a replication environment, the reference implementation
requires the same key to be configured in the settings on both primary and
replicas (shared key model). For future KMS integration, I'm considering
mechanisms to propagate keys to replicas through external channels rather
than WAL.

The page header approach was chosen because it keeps the encryption state
self-contained within each page, avoiding the need for separate metadata
synchronization.


6. Gradual Rotation Mechanism

I agree with you - I don't think core support is necessary for gradual
rotation either.

I mentioned it in my earlier email response only as a potential reference
implementation concept to guide encryption developers. It's something that
can and should be implemented in the extension's background worker, not in
core.


Summary

I see the hook approach and SMGR extensibility as equally valid, addressing
different timelines and use cases:

- Hooks: Available now, lighter-weight, sufficient for compliance-driven TDE
- SMGR extensibility: More comprehensive, cleaner architecture, better
long-term solution

Both should coexist. Organizations can use hooks today while SMGR
extensibility matures, then migrate if the SMGR approach better fits their
needs.

I'm very interested in your experience with pg_tde and the SMGR
extensibility work. If there are specific design considerations from that
work that would inform these hooks, I'd appreciate your input.

Best regards,
Henson

2025년 12월 29일 (월) AM 2:55, Zsolt Parragi <[email protected]>님이 작성:

> > - mdread_post_hook: inside the segment loop → Decorator NOT possible
>
> > The mdreadv() function, introduced in PostgreSQL 17 as part of the
> > vectored I/O API, processes multiple blocks in a loop that respects
> > segment boundaries. The decryption hook must be called inside this loop,
> > after each segment's FileReadV() completes. A decorator wrapping
> mdreadv()
> > from the outside cannot access this internal loop timing.
>
> It is possible - or rather, we plan to propose a different patch for
> that. There are already some discussions about extendibility of AIO,
> which is currently quite minimal, and this is another point for that.
> If you look into the AIO sources, it already uses an array of
> callbacks, and there's only a small missing piece there - making it
> possible for extensions to add entries to that array. With that patch,
> it is possible to decorate smgr_startreadv, add your own callback, and
> then call the original mdstartreadv function. Since aio callbacks are
> executed in the opposite order, this will work out exactly as needed,
> as the AIO handler will first call the md completion handler, then
> yours.
>
> My logic here is similar to the previous argument: this AIO
> extensibility for startreadv is also needed for other uses of the smgr
> extension, most likely for everyone who uses the current patch. It
> shouldn't be specific to encryption.
>
> > With the SMGR decorator approach, the extension developer must:
> > - Track upstream md.c changes
> > - Replicate the internal loop logic to find the right decryption point
>
> > With hooks, the extension developer only needs to:
> > - Implement encrypt() and decrypt()
>
> > We need a simple, stable hook interface that allows local security
> > experts to integrate these required algorithms - experts who understand
> > cryptography but not PostgreSQL storage internals.
>
> Extension developers still have to understand the multiprocess nature
> of postgres (with AIO you also have to remember that it is possible
> for the completion to happen in a different process, possibly in a
> worker process), or its unusual memory management patterns, critical
> sections, and so on. You most likely also have to deal with shared
> memory caches, locks, and so on.
>
> (And as I said above, you don't have to replicate/track md.c, we only
> need a good, generic extension point usable for many extensions)
>
> > In South Korea, government
> > regulations require the use of nationally-approved cryptographic
> > algorithms (such as ARIA, SEED). This means organizations often cannot
> > adopt foreign TDE solutions, regardless of their technical merit.
>
> Have you considered contributing to existing solutions? Adding support
> to multiple algorithms to an existing library is easier than
> developing your own from scratch.
>
> > WAL and heap pages are simply different representations of the same
> > underlying data. Protecting only one side would be cryptographically
> > incomplete; an attacker could bypass encryption by reading the
> > unprotected side. Therefore, they must be treated as a single atomic
> > unit of protection.
>
> From a security point of view, I agree. From a practical one, it's a
> bit more complicated. As you mentioned South Korean regulations, we
> also have regulations in the European Union, and you can conform to
> the current regulations by only encrypting your data files (at least
> that's what I heard, I'm not a lawyer).
>
> So from a practical point of view, for us, even getting support for
> table encryption hooks into the core would be a success.
>
> > My primary concern with using fork files for encryption metadata is crash
> > recovery. If a fork file and the actual data page become inconsistent
> > (e.g., during a crash), recovery becomes problematic because fork files
> > are not typically protected by WAL.
>
> Custom WAL records about encryption events (key rotation/change/etc)
> should solve this problem?
>
> > I plan to propose a separate RFC for this
> > "gradual rotation" mechanism.
>
> Would this gradual rotation mechanism be useful for anything else
> other than encryption extensions? While I also had the same idea, I
> don't see how it would be useful for anything else, so I didn't plan
> to submit any patches related to this. This is something that can be
> easily implemented as a background worker in a tde extension, and
> doesn't really require core support.
>

Re: RFC: PostgreSQL Storage I/O Transformation Hooks

Reply via email to