Re: PANIC: could not flush dirty data: Cannot allocate memory
Some more updates Did this start after upgrading to 22.04? Or after a certain kernel upgrade? For sure it only started with Ubuntu 22.04. We did not had and still not have any issues on servers with Ubuntu 20.04 and 18.04. It also happens with Ubuntu 22.10 (Kernel 5.19.0-23-generic). We now try 6.0 mainline and 5.15. mainline kernel on some servers. I also forgot to mention that /var/lib/postgresql/12 directory is encrypted with fscrypt (ext4 encryption). So we also deactivated the directory encryption on one server to see if it is related to encryption. thanks Klaus
Re: PANIC: could not flush dirty data: Cannot allocate memory
Hello all! Thanks for the many hints to look for. We did some tuning and further debugging and here are the outcomes, answering all questions in a single email. In the meantime, you could experiment with setting checkpoint_flush_after to 0 We did this: # SHOW checkpoint_flush_after; checkpoint_flush_after 0 (1 row) But we STILL have PANICs. I tried to understand the code but failed. I guess that there are some code paths which call pg_flush_data() without checking this settings, or the check does not work. Did this start after upgrading to 22.04? Or after a certain kernel upgrade? For sure it only started with Ubuntu 22.04. We did not had and still not have any issues on servers with Ubuntu 20.04 and 18.04. I would believe that the kernel would raise a bunch of printks if it hit ENOMEM in the commonly used paths, so you would see something in dmesg or wherever you collect your kernel log if it happened where it was expected. There is nothing in the kernel logs (dmesg) Do you use cgroups or such to limit memory usage of postgres? No Any uncommon options on the filesystem or the mount point? No. Also no Antivirus: /dev/xvda2 / ext4 noatime,nodiratime,errors=remount-ro 0 1 or LABEL=cloudimg-rootfs /ext4 discard,errors=remount-ro 0 1 does this happen on all the hosts, or is it limited to one host or one technology? It happens on XEN VMs, KVM VMs and VMware VMs. On Intel and AMD plattforms. Another interesting thing would be to know the mount and file system options for the FS that triggers the failures. E.g. # tune2fs -l /dev/sda1 tune2fs 1.46.5 (30-Dec-2021) Filesystem volume name: cloudimg-rootfs Last mounted on: / Filesystem UUID: 0522e6b3-8d40-4754-a87e-5678a6921e37 Filesystem magic number: 0xEF53 Filesystem revision #:1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg encrypt sparse_super large_file huge_file dir_nlink extra_isize metadata_csum Filesystem flags: signed_directory_hash Default mount options:user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 12902400 Block count: 26185979 Reserved block count: 0 Overhead clusters:35096 Free blocks: 18451033 Free inodes: 12789946 First block: 0 Block size: 4096 Fragment size:4096 Group descriptor size:64 Reserved GDT blocks: 243 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 16128 Inode blocks per group: 1008 Flex block group size:16 Filesystem created: Wed Apr 20 18:31:24 2022 Last mount time: Thu Nov 10 09:49:34 2022 Last write time: Thu Nov 10 09:49:34 2022 Mount count: 7 Maximum mount count: -1 Last checked: Wed Apr 20 18:31:24 2022 Check interval: 0 () Lifetime writes: 252 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 32 Desired extra isize: 32 Journal inode:8 First orphan inode: 42571 Default directory hash: half_md4 Directory Hash Seed: c5ef129b-fbee-4f35-8f28-ad7cc93c1c43 Journal backup: inode blocks Checksum type:crc32c Checksum: 0xb74ebbc3 Thanks Klaus
Re: PANIC: could not flush dirty data: Cannot allocate memory
Hi, On 2022-11-16 09:16:56 -0800, Andres Freund wrote: > On 2022-11-15 13:23:56 +0100, klaus.mailingli...@pernau.at wrote: > > Filesystem is ext4. VM technology is mixed: VMware, KVM and XEN PV. Kernel > > is 5.15.0-52-generic. > > > > We have not seen this with Ubutnu 18.04 and 20.04 (although we might not > > have noticed it). > > Did this start after upgrading to 22.04? Or after a certain kernel upgrade? > > Do you use cgroups or such to limit memory usage of postgres? > > I'd be helpful to see /proc/meminfo from one of the affected instances. Another interesting thing would be to know the mount and file system options for the FS that triggers the failures. E.g. tune2fs -l path/to/blockdev and grep path/to/blockdev /proc/mounts Greetings, Andres Freund
Re: PANIC: could not flush dirty data: Cannot allocate memory
Hi, On 2022-11-15 13:23:56 +0100, klaus.mailingli...@pernau.at wrote: > Filesystem is ext4. VM technology is mixed: VMware, KVM and XEN PV. Kernel > is 5.15.0-52-generic. > > We have not seen this with Ubutnu 18.04 and 20.04 (although we might not > have noticed it). Did this start after upgrading to 22.04? Or after a certain kernel upgrade? Do you use cgroups or such to limit memory usage of postgres? I'd be helpful to see /proc/meminfo from one of the affected instances. Greetings, Andres Freund
Re: PANIC: could not flush dirty data: Cannot allocate memory
## klaus.mailingli...@pernau.at (klaus.mailingli...@pernau.at): > AFAIU the problem is not related to the memory settings in > postgresql.conf. It is the kernel that > for whatever reasons report ENOMEM. Correct? Correct, there's a ENOMEM from the kernel when writing out data. > Filesystem is ext4. VM technology is mixed: VMware, KVM and XEN PV. > Kernel is 5.15.0-52-generic. I do not suspect the filesystem per se, ext4 is quite common and we would have heard something about that (but then, someone gotta be the first reporter?). I would believe that the kernel would raise a bunch of printks if it hit ENOMEM in the commonly used paths, so you would see something in dmesg or wherever you collect your kernel log if it happened where it was expected. And coming from the other side: does this happen on all the hosts, or is it limited to one host or one technology? Any uncommon options on the filesystem or the mount point? Anything which could mess with your block devices? (I'm expecially thinking "antivirus" because it's always "0 days since the AV ate a database" and they tend to raise errors in the weirdest places, which would fit the bill here; but anythig which is not "commonly in use everywhere" could be a candidate). Regards, Christoph -- Spare Space
Re: PANIC: could not flush dirty data: Cannot allocate memory
On Wed, Nov 16, 2022 at 1:24 AM wrote: > Filesystem is ext4. VM technology is mixed: VMware, KVM and XEN PV. > Kernel is 5.15.0-52-generic. > > We have not seen this with Ubutnu 18.04 and 20.04 (although we might not > have noticed it). > > I guess upgrading to postgresql 13/14/15 does not help as the problem > happens in the kernel. > > Do you have any advice how to go further? Shall I lookout for certain > kernel changes? In the kernel itself or in ext4 changelog? It'd be good to figure out what is up with Linux or tuning. I'll go write a patch to reduce that error level for non-EIO errors, to discuss for the next point release. In the meantime, you could experiment with setting checkpoint_flush_after to 0, so the checkpointer/bgwriter/other backends don't call sync_file_range() all day long. That would have performance consequences for checkpoints which might be unacceptable though. The checkpointer will fsync relations one after another, with less I/O concurrency. Linux is generally quite lazy at writing back dirty data, and doesn't know about our checkpointer's plans to fsync files on a certain schedule, which is why we ask it to get started on multiple files concurrently using sync_file_range(). https://www.postgresql.org/docs/15/runtime-config-wal.html#RUNTIME-CONFIG-WAL-CHECKPOINTS
Re: PANIC: could not flush dirty data: Cannot allocate memory
Thanks all for digging into this problem. AFAIU the problem is not related to the memory settings in postgresql.conf. It is the kernel that for whatever reasons report ENOMEM. Correct? Am 2022-11-14 22:54, schrieb Christoph Moench-Tegeder: ## klaus.mailingli...@pernau.at (klaus.mailingli...@pernau.at): On several servers we see the error message: PANIC: could not flush dirty data: Cannot allocate memory As far as I can see, that "could not flush dirty data" happens total three times in the code - there are other places where postgresql could PANIC on fsync()-and-stuff-related issues, but they have different messages. Of these three places, there's an sync_file_range(), an posix_fadvise() and an msync(), all in src/backend/storage/file/fd.c. "Cannot allocate memory" would be ENOMEM, which posix_fadvise() does not return (as per it's docs). So this would be sync_file_range(), which could run out of memory (as per the manual) or msync() where ENOMEM actually means "The indicated memory (or part of it) was not mapped". Both cases are somewhat WTF for this setup. What filesystem are you running? Filesystem is ext4. VM technology is mixed: VMware, KVM and XEN PV. Kernel is 5.15.0-52-generic. We have not seen this with Ubutnu 18.04 and 20.04 (although we might not have noticed it). I guess upgrading to postgresql 13/14/15 does not help as the problem happens in the kernel. Do you have any advice how to go further? Shall I lookout for certain kernel changes? In the kernel itself or in ext4 changelog? Thanks Klaus
Re: PANIC: could not flush dirty data: Cannot allocate memory
Thomas Munro writes: > It has been argued before that we might have been over-zealous > applying the PANIC promotion logic to sync_file_range(). It's used to > start asynchronous writeback to make the later fsync() call fast, so > it's "only a hint", but I have no idea if it could report a writeback > error from the kernel that would then be consumed and not reported to > the later fsync(), so I defaulted to assuming that it could. Certainly, if it reports EIO, we should panic. But maybe not for ENOMEM? One would assume that that means that the request didn't get queued for lack of in-kernel memory space ... in which case "nothing happened". regards, tom lane
Re: PANIC: could not flush dirty data: Cannot allocate memory
On Tue, Nov 15, 2022 at 10:54 AM Christoph Moench-Tegeder wrote: > ## klaus.mailingli...@pernau.at (klaus.mailingli...@pernau.at): > > On several servers we see the error message: PANIC: could not flush > > dirty data: Cannot allocate memory > Of these three places, there's an sync_file_range(), an posix_fadvise() > and an msync(), all in src/backend/storage/file/fd.c. "Cannot allocate > memory" would be ENOMEM, which posix_fadvise() does not return (as per > it's docs). So this would be sync_file_range(), which could run out > of memory (as per the manual) or msync() where ENOMEM actually means > "The indicated memory (or part of it) was not mapped". Both cases are > somewhat WTF for this setup. It must be sync_file_range(). The others are fallbacks that wouldn't apply on a modern Linux. It has been argued before that we might have been over-zealous applying the PANIC promotion logic to sync_file_range(). It's used to start asynchronous writeback to make the later fsync() call fast, so it's "only a hint", but I have no idea if it could report a writeback error from the kernel that would then be consumed and not reported to the later fsync(), so I defaulted to assuming that it could.
Re: PANIC: could not flush dirty data: Cannot allocate memory
## klaus.mailingli...@pernau.at (klaus.mailingli...@pernau.at): > On several servers we see the error message: PANIC: could not flush > dirty data: Cannot allocate memory As far as I can see, that "could not flush dirty data" happens total three times in the code - there are other places where postgresql could PANIC on fsync()-and-stuff-related issues, but they have different messages. Of these three places, there's an sync_file_range(), an posix_fadvise() and an msync(), all in src/backend/storage/file/fd.c. "Cannot allocate memory" would be ENOMEM, which posix_fadvise() does not return (as per it's docs). So this would be sync_file_range(), which could run out of memory (as per the manual) or msync() where ENOMEM actually means "The indicated memory (or part of it) was not mapped". Both cases are somewhat WTF for this setup. What filesystem are you running? Regards, Christoph -- Spare Space
Re: PANIC: could not flush dirty data: Cannot allocate memory
klaus.mailingli...@pernau.at writes: > On several servers we see the error message: PANIC: could not flush > dirty data: Cannot allocate memory What that's telling you is that fsync (or some equivalent OS call) returned ENOMEM, which would seem to be a kernel-level deficiency. Perhaps you could dodge it by using a different wal_sync_method setting, but complaining to your kernel vendor seems like the main thing to be doing. The reason we treat it as a PANIC condition is * Failure to fsync any data file is cause for immediate panic, unless * data_sync_retry is enabled. Data may have been written to the operating * system and removed from our buffer pool already, and if we are running on * an operating system that forgets dirty data on write-back failure, there * may be only one copy of the data remaining: in the WAL. A later attempt to * fsync again might falsely report success. Therefore we must not allow any * further checkpoints to be attempted. data_sync_retry can in theory be * enabled on systems known not to drop dirty buffered data on write-back * failure (with the likely outcome that checkpoints will continue to fail * until the underlying problem is fixed). As noted here, turning on data_sync_retry would reduce the PANIC to a WARNING. But I wouldn't recommend that without some assurances from your kernel vendor about what happens in the kernel after such a failure. The panic restart should (in theory) ensure data consistency is preserved; without it we can't offer any guarantees. regards, tom lane
PANIC: could not flush dirty data: Cannot allocate memory
Hi all! We have a setup with a master and plenty of logical replication slaves. Master and slaves are 12.12-1.pgdg22.04+1 runnning on Ubuntu 22.04. SELECT pg_size_pretty( pg_database_size('regdns') ); is from 25GB (fresh installed slave) to 42GB (probably bloat) Replication slaves VMs have between 22G and 48G RAM, most have 48G RAM. We are using: maintenance_work_mem = 128MB work_mem = 64MB and VMs with 48G RAM: effective_cache_size = 8192MB shared_buffers = 6144MB and VMs with 22G RAM: effective_cache_size = 4096MB shared_buffers = 2048MB On several servers we see the error message: PANIC: could not flush dirty data: Cannot allocate memory Unfortunately I do not find any reference to this kind of error. Can you please describe what happens here in detail? Is it related to server memory? Or our memory settings? I am not so surprised that it happens with the 22G RAM VM. It is not happening on our 32G RAM VMs. But it also happens on some of the 48G RAM VMs which should have plenty of RAM available: # free -h totalusedfree shared buff/cache available Mem:47Gi 9Gi 1.2Gi 6.1Gi35Gi 30Gi Swap: 7.8Gi 3.0Gi 4.9Gi Of course I could upgrade all our VMs and then wait and see if it solved the problem. But I would like to understand what is happening here before spending $$$. Thanks Klaus