On Wed, Dec 3, 2025 at 3:48 AM Colin 't Hart <[email protected]> wrote: > One of my clients has Microsoft Defender for Endpoint on Linux installed on > their Postgres servers. > > I was testing a database restore from pgBackRest. The restore itself seemed > to complete in a reasonable amount of time, but then the Postgres recovery > started and it was extremely slow to retrieve and apply the WAL files. > > I noticed wdavdaemon taking most of the CPU, and Postgres getting very little.
These days, tools like that work by monitoring every read, write etc via kernel event queues (fanotify on Linux, ESF on macOS, IDK on Windows, it might still be using something more efficient but less isolated with tentacles inside the kernel). Those queues usually have a fixed size and when they overflow because the event consumer isn't keeping up, the monitored process can be blocked. That's probably true even if running in a mode where it doesn't have to reply to allow the operation to proceed. Presumably the consumer is running some kind of rolling fingerprint check over the data looking for things from its database of malware, which you'd hope would be very well optimised... My pet theory is that PostgreSQL suffers from these systems more than anything else not because of the total bandwidth but because of the per-operation overheads and our historical 8KB-at-a-time disk and network I/O. Your report about pgBackRest supports that idea: it probably copies a larger total size in big chunks, while recovery reads the WAL 8KB at a time (and evicts data 8KB at a time if your buffer pool is small), and then finally the checkpointer writes back 8KB at a time. Another factor is that it might be using only one fanotify queue for each process, or worse, but IDK if that matters, it sounds like the CPU might be saturated anyway? Future releases should improve all of that with bigger I/Os for WAL (read through an 8KB drinking straw, dunno if it's spying on reads too?) and data (I/O combining, various strategies, various prototypes[1][2], watch this space). It's also been proposed a few times that we should have an option to skip the end-of-recovery checkpoint, so then you'd get a regular "spread" checkpoint that the spyware could keep up with (assuming that it normally keeps up, just not in crash recovery). Another thing that probably makes this worse in this strange environment, if we assume it is due to small writes and reads are not affected, is that crash recovery currently dirties all pages that the WAL touches, forgetting progress that already made it to disk: it overwrites the LSN with an FPW and then replays all changes on top, when it could instead read the page in and skip a lot of work if the LSN is high enough, thereby often avoiding dirtying and re-writing the page, whenever checksums are on (as they are now by default). The checksum could be used as proof that the page wasn't torn by a non-atomic write interrupted by a power outage. I doubt anyone is really that interested in optimising for such setups per se when anyone will tell you to just turn it off, but the reason I've thought about it enough to take a guess is that my corporate-managed Mac was running the PostgreSQL test suite so slowly it would time out, and I was sufficiently nerd-sniped to figure out that it could keep up with bursts of I/O pretty well, but everything turned to custard under sustained workloads, notably in the recovery tests which deliberately run with a tiny buffer pool. As someone working on bits of our I/O plumbing, I couldn't help speculating that something that is objectively terrible about PostgreSQL is really just being magnified by strange new overheads that mess with the economics. It may not be a goal but I will still be happy if it copes with this stuff as a by-product of general improvements like generalised I/O combining. (Funnily enough I've actually got a bunch of unpublished tooling to simulate, detect and manage invisible I/O queuing.) > I wonder if anyone here has any experience with configuring exclusions so > that the WAL files can be processed faster? Yep, it entirely fixed the cliff and vastly reduced the CPU usage on my corporate Mac. There is still a small measurable slowdown, but the recovery test suite couldn't even complete without timing out while monitored. I expect exactly the same on Linux but haven't tried it. > Any advice on what to communicate with their IT department about using this > on their database servers? I've never encountered it on Linux before... There is lots of writing on the internet about excluding pgdata from these types of tools. Much of it is concerned with Windows-specific problems: opening files and directories or mapping files at bad times can cause various PostgreSQL file operations to fail on that OS. I don't know of any reason why periodic scans of pgdata should interfere with PostgreSQL on Linux other than consuming I/O bandwidth, it seems to be just the per-syscall stuff that is unworkable. You might be able to show "meson test" failing as some kind of evidence that PostgreSQL is allergic to it. Or if you want to try to find a one-liner demonstration independent of PostgreSQL, you could test the can't-keep-up-with-stream-of-tiny-writes theory by experimenting with "dd" at different block sizes. I expect you'll find a size below which the fanotify queue quickly overflows and performance falls off a cliff. Current versions of PostgreSQL assumed fast and consistent buffered writes and pretended the system calls were free. These monitoring tools make them expensive and also non-linear by sending messages around with carrier pigeons. [1] https://www.postgresql.org/message-id/flat/CAAKRu_bcWRvRwZUop_d9vzF9nHAiT%2B-uPzkJ%3DS3ShZ1GqeAYOw%40mail.gmail.com [2] https://www.postgresql.org/message-id/flat/CA%2BhUKGK1in4FiWtisXZ%2BJo-cNSbWjmBcPww3w3DBM%2BwhJTABXA%40mail.gmail.com
