Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
Hi Tomas,
Thank you for this critical feedback. Your concerns go to the heart of the
proposal's viability, and I appreciate your directness.
1. Multiple Extensions and Hook Chaining
You're right to question this. To be honest, I have significant doubts
about allowing multiple transformation extensions simultaneously.
The Transform ID coordination problem is real: without a registry or
protocol between extensions, they cannot cooperate safely. Hook chaining
for read/write operations might work (extension A encrypts, extension B
compresses), but the Transform ID field creates conflicts.
Perhaps I should be more direct: transformation hook chaining is not
realistically possible with the current design. TDE extensions would need
exclusive use of these hooks. This is a fundamental limitation I should
have stated clearly in the RFC.
2. pd_flags Reservation - I Hope You'll Consider This
I understand your concern about reserving pd_flags bits for extensions.
However, I'd like to ask you to consider the reasoning behind this choice.
The 5-bit Transform ID serves a critical purpose: it allows the core to
identify the page's transformation state without attempting decryption.
This is important for:
- Error reporting: "This page is encrypted with transform ID 5, but no
extension is loaded to handle it"
- Migration safety: Distinguishing between untransformed pages (ID=0) and
transformed pages during gradual encryption
- Crash recovery: The core can detect transformation state inconsistencies
That said, I recognize pd_flags is precious and limited. Let me propose an
alternative approach that might better align with core principles:
Instead of extension-specific Transform IDs, what if we allow extensions to
reserve space at pd_upper (similar to how special space works at
pd_special)?
The core could manage a small flag (2-3 bits) indicating "N bytes at
pd_upper are reserved for transformation metadata". By encoding N as
multiples of 2 or 4 bytes, we maximize the flag's efficiency:
- 2 bits encoding 4-byte multiples: 0-12 bytes (sufficient for most cases)
- 3 bits encoding 4-byte multiples: 0-28 bytes (covers all reasonable needs)
- 3 bits encoding 2-byte multiples: 0-14 bytes (finer granularity)
This approach uses minimal pd_flags bits while providing substantial
metadata space. It would:
- Keep the flag in core control (not extension-specific)
- Allow extensions to store IV, authentication tags, key version, etc. in a
standardized location
- Be self-describing (the flag tells you how much space is reserved)
- Generalize beyond encryption (compression, checksums, etc. could use it)
In our internal implementation, we actually add opaque bytes to PageHeader
for encryption metadata. This pd_upper approach could formalize that
pattern for extensions.
I believe some form of page-level metadata for transformations is
necessary. Would either approach (Transform ID or pd_upper reservation) be
acceptable with the right design, or do you see fundamental issues with
page-level transformation metadata itself?
3. Maintenance Burden and Test Coverage
I deeply appreciate this concern. Having worked across various DBMS
implementations, I've seen solution vendors ship without comprehensive
regression testing - but never a database vendor. DBMS maintenance is
extraordinarily difficult, and storage errors are catastrophic.
This is precisely why test_tde exists as a reference implementation. But
you've identified the real issue: we need much stronger test coverage for
the hooks themselves.
The test cases should:
- Detect when core changes break hook contracts
- Verify hook behavior under all I/O paths (sync, async, error cases)
- Validate critical section safety
- Test interaction with checksums, crash recovery, replication
I agree the current test coverage is insufficient for core inclusion. Would
expanding the test suite to cover these scenarios address your maintenance
concerns, or do you see fundamental fragility beyond what testing can solve?
4. Hooks vs Transform Layer - Pragmatic Timeline
You suggested improving SMGR extensibility rather than adding hooks. I
think you're architecturally right about the long-term direction.
However, I want to be pragmatic about timelines:
The hook and pd_flags approach, despite its limitations, can deliver
working TDE in the shortest time. Organizations facing regulatory deadlines
need something that works now, not in 2-3 years.
That said, your feedback has sparked a better idea: what if we think of
this not as "SMGR extension" or "hooks" but as a pluggable Transform Layer
that SMGR and WAL subsystems delegate to?
Conceptually:
Application Layer
|
Buffer Manager
|
+------------------+
| Transform Layer | <-- Encryption, etc.
+------------------+
|
SMGR / WAL
|
File I/O
This is architecturally cleaner than scattered hooks, and more focused than
full SMGR extensibility. The Transform Layer would:
- Provide a unified interface for data transformation
- Work across backend, frontend tools, and replication
- Handle metadata management in a standardized way
- Support encryption, compression, or other transformations
I think this deserves its own discussion thread rather than conflating it
with the current hook proposal. Would you be interested in starting a
separate conversation about designing a Transform Layer interface for
PostgreSQL?
In the meantime, the hook approach could serve organizations with immediate
needs, and extensions could migrate to the Transform Layer once it's
stabilized.
5. Frontend Tool Access
Both SMGR and hook approaches face a shared limitation: frontend tools
(pg_checksums, pg_basebackup, etc.) that read files directly.
I previously suggested allowing initdb to specify a shared library that
both backend and frontend can load for transformation. But as I reconsider
this, it feels like it converges toward the Transform Layer idea: a
well-defined interface that any PostgreSQL component can use.
This might be the real architectural question: not "hooks vs SMGR" but "how
should PostgreSQL provide transformation points that work across backend,
frontend, and replication boundaries?"
Summary
Your feedback has clarified three important points:
1. The current hook design has real limitations (multiple extension
conflicts, pd_flags concerns)
2. Test coverage needs to be much more comprehensive
3. A cleaner abstraction might be needed long-term
I propose a dual approach:
Short-term: Move forward with the hook proposal for organizations with
immediate regulatory needs. I commit to:
- Stating clearly that hook chaining is not supported
- Significantly expanding test coverage
- Treating this as a pragmatic solution with known limitations
Long-term: I'd like to start a separate discussion about a Transform Layer
abstraction - a unified interface that could handle data transformation
across backend, frontend tools, and replication. This would be
architecturally cleaner than scattered hooks, and could eventually
supersede this approach.
Would you be willing to review a Transform Layer proposal in a separate
thread? I think it addresses the architectural concerns you've raised,
while the hook approach serves immediate practical needs.
Best regards,
Henson
2025년 12월 29일 (월) AM 4:24, Tomas Vondra <[email protected]>님이 작성:
> On 12/28/25 08:49, Henson Choi wrote:
> >
> > 3. Proposal Specifications
> >
> >
> > 3.1 The Interface (Hook Points)
> >
> > We allow intervention by security experts through five contact points
> > along the I/O path:
> >
> > * *Read/Write Hooks:* |mdread_post|, |mdwrite_pre|, |mdextend_pre|
> > (Transformation of the data area)
> > * *WAL Hooks:* |xlog_insert_pre|, |xlog_decode_pre| (Transformation of
> > transaction logs)
> >
> >
> > 3.2 The Protocol Identifier (PageHeader Transformation ID)
> >
> > We allocate 5 bits of |pd_flags| to define the “Security State” of a
> > page. This serves as a *Status Message* sent by the security expert to
> > the engine, utilized for key versioning and as a migration marker.
> >
>
> Isn't this rather problematic?
>
> This seems to be meant to be extensible, which means there can be
> multiple extensions setting the hooks. Which we generally allow, and the
> custom is to call the previous hook.
>
> What happens if there are multiple extensions implementing the hook?
> Would that be allowed or prohibited in this case? Maybe it doesn't make
> sense, but then why wouldn't it be possible?
>
> FWIW I find it very unlikely we'd allow reserving pd_flags bits for an
> extension. These bits are meant to be used by core, there's very limited
> number of such bits.
>
>
> In general, I'm somewhat skeptical of the claim a collection of hooks is
> "low-barrier, high-safety". It seems pretty fragile to me, and I can
> envision a lot of maintenance difficulties in the future. Not just for
> the extension developers, but for the project too - adding a bunch of
> random hooks is not free for us, we'll need to keep it working in future
> releases, etc.
>
> Perhaps the current SMGR code is not extensible/flexible enough, but
> then we need to improve that. I'd imagine a simple SMGR doing the
> encryption, but federating most of the work to a "full" SMGR. But I
> haven't thought about that too much.
>
>
> regards
>
> --
> Tomas Vondra
>
>