Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks Hi Zsolt,
Thank you for the detailed technical feedback. Let me address each point. 1. AIO Extensibility and SMGR Approach I think the SMGR extensibility approach is equally valid. In fact, when I realized in PG18 that buffer page reads are split between md.c (mdreadv) and bufmgr.c (buffer_readv_complete_one), I felt some discomfort about where to place the decryption hook. "Does this really belong in both places?" was my first thought. The SMGR approach could provide a cleaner, more unified integration point for data transformation. The main difference is timing and current availability: - The hook approach is working today and can be used immediately - Your SMGR extensibility work provides a more comprehensive long-term solution I don't see these as competing proposals. Both approaches are valid and serve different needs. The hook infrastructure can serve as an interim solution for organizations that need TDE now, while the community develops the more comprehensive SMGR extensibility. In the long term, if SMGR extensibility provides better integration points, extensions could migrate to that approach. 2. Understanding PostgreSQL Internals You're absolutely right that extension developers need to understand multiprocess architecture, memory management, critical sections, and so on. This is precisely why test_tde exists as a reference implementation. It documents the "dance steps" with the core - showing where memory must be pre-allocated, how to handle critical sections safely, when AIO completion might happen in a different process, and so on. The goal isn't to hide PostgreSQL's complexity, but to provide a working example that shows cryptography experts exactly where and how to integrate their algorithms within PostgreSQL's constraints. 3. Contributing to Existing Solutions vs Korean Regulations I appreciate the suggestion about contributing to existing solutions. I personally prefer the OpenSSL Provider approach for algorithm extensibility. However, the reality is more complex. Cryptography experts often have their own libraries developed over decades. While it might look like "just encryption code" to me, I don't have the authority to force them to adopt specific frameworks. ARIA and SEED are already implemented in OpenSSL. However, Korean law requires certified implementations. Specifically, companies must use nationally-certified builds and provide the hash codes of those specific library binaries to regulators. You cannot simply use the OpenSSL version, even if the algorithm is identical. This is why we need an extension mechanism rather than hardcoding specific libraries into core. Different jurisdictions have different certification requirements. 4. WAL vs Data File Encryption You mentioned that EU regulations might be satisfied by encrypting only data files. That's a valid practical consideration. In Korea, regulations require the introduction of approved cryptographic algorithms, but in practice most systems run AES due to lack of CPU acceleration for ARIA/SEED. It's largely a legal compliance checkbox. Regarding what to protect (WAL vs heap vs both), there's flexibility depending on the organization and jurisdiction. The hook approach allows extensions to choose - you can implement only the buffer hooks if that satisfies your requirements, or add WAL hooks if needed. 5. Fork Files vs Page Header for Metadata You asked whether custom WAL records about encryption events could solve the crash recovery problem with fork files. That's a reasonable approach for SMGR-based solutions where you control the storage layer. However, with the hook approach, we don't have the ability to inject custom WAL records for encryption events. Currently, in a replication environment, the reference implementation requires the same key to be configured in the settings on both primary and replicas (shared key model). For future KMS integration, I'm considering mechanisms to propagate keys to replicas through external channels rather than WAL. The page header approach was chosen because it keeps the encryption state self-contained within each page, avoiding the need for separate metadata synchronization. 6. Gradual Rotation Mechanism I agree with you - I don't think core support is necessary for gradual rotation either. I mentioned it in my earlier email response only as a potential reference implementation concept to guide encryption developers. It's something that can and should be implemented in the extension's background worker, not in core. Summary I see the hook approach and SMGR extensibility as equally valid, addressing different timelines and use cases: - Hooks: Available now, lighter-weight, sufficient for compliance-driven TDE - SMGR extensibility: More comprehensive, cleaner architecture, better long-term solution Both should coexist. Organizations can use hooks today while SMGR extensibility matures, then migrate if the SMGR approach better fits their needs. I'm very interested in your experience with pg_tde and the SMGR extensibility work. If there are specific design considerations from that work that would inform these hooks, I'd appreciate your input. Best regards, Henson 2025년 12월 29일 (월) AM 2:55, Zsolt Parragi <[email protected]>님이 작성: > > - mdread_post_hook: inside the segment loop → Decorator NOT possible > > > The mdreadv() function, introduced in PostgreSQL 17 as part of the > > vectored I/O API, processes multiple blocks in a loop that respects > > segment boundaries. The decryption hook must be called inside this loop, > > after each segment's FileReadV() completes. A decorator wrapping > mdreadv() > > from the outside cannot access this internal loop timing. > > It is possible - or rather, we plan to propose a different patch for > that. There are already some discussions about extendibility of AIO, > which is currently quite minimal, and this is another point for that. > If you look into the AIO sources, it already uses an array of > callbacks, and there's only a small missing piece there - making it > possible for extensions to add entries to that array. With that patch, > it is possible to decorate smgr_startreadv, add your own callback, and > then call the original mdstartreadv function. Since aio callbacks are > executed in the opposite order, this will work out exactly as needed, > as the AIO handler will first call the md completion handler, then > yours. > > My logic here is similar to the previous argument: this AIO > extensibility for startreadv is also needed for other uses of the smgr > extension, most likely for everyone who uses the current patch. It > shouldn't be specific to encryption. > > > With the SMGR decorator approach, the extension developer must: > > - Track upstream md.c changes > > - Replicate the internal loop logic to find the right decryption point > > > With hooks, the extension developer only needs to: > > - Implement encrypt() and decrypt() > > > We need a simple, stable hook interface that allows local security > > experts to integrate these required algorithms - experts who understand > > cryptography but not PostgreSQL storage internals. > > Extension developers still have to understand the multiprocess nature > of postgres (with AIO you also have to remember that it is possible > for the completion to happen in a different process, possibly in a > worker process), or its unusual memory management patterns, critical > sections, and so on. You most likely also have to deal with shared > memory caches, locks, and so on. > > (And as I said above, you don't have to replicate/track md.c, we only > need a good, generic extension point usable for many extensions) > > > In South Korea, government > > regulations require the use of nationally-approved cryptographic > > algorithms (such as ARIA, SEED). This means organizations often cannot > > adopt foreign TDE solutions, regardless of their technical merit. > > Have you considered contributing to existing solutions? Adding support > to multiple algorithms to an existing library is easier than > developing your own from scratch. > > > WAL and heap pages are simply different representations of the same > > underlying data. Protecting only one side would be cryptographically > > incomplete; an attacker could bypass encryption by reading the > > unprotected side. Therefore, they must be treated as a single atomic > > unit of protection. > > From a security point of view, I agree. From a practical one, it's a > bit more complicated. As you mentioned South Korean regulations, we > also have regulations in the European Union, and you can conform to > the current regulations by only encrypting your data files (at least > that's what I heard, I'm not a lawyer). > > So from a practical point of view, for us, even getting support for > table encryption hooks into the core would be a success. > > > My primary concern with using fork files for encryption metadata is crash > > recovery. If a fork file and the actual data page become inconsistent > > (e.g., during a crash), recovery becomes problematic because fork files > > are not typically protected by WAL. > > Custom WAL records about encryption events (key rotation/change/etc) > should solve this problem? > > > I plan to propose a separate RFC for this > > "gradual rotation" mechanism. > > Would this gradual rotation mechanism be useful for anything else > other than encryption extensions? While I also had the same idea, I > don't see how it would be useful for anything else, so I didn't plan > to submit any patches related to this. This is something that can be > easily implemented as a background worker in a tde extension, and > doesn't really require core support. >
