Hi Thomas, Apologies for uploading the wrong screenshot in the attachment. I have uploaded the correct output of pg_test_fsync for your reference .
Please find our environment details as below: Infra: Azure IaaS OS: Red Hat Enterprise Linux Server release 7.9 File System: XFS Disk: Premium SSD IOPS: 5000 per Disk Throughput: 200 MB/Sec per Disk Disk Cache: Read+Write Workload: OLTP Actual issue is we are getting slow execution of Commit & Prepared Commit statements only. Execution of these statements is taking 2 to 14 seconds. That's why we are analysing the pg_test_fsync output & FIO for further troubleshooting but unable to find any right directions. On Mon, 13 Dec 2021 at 10:45, Thomas Munro <thomas.mu...@gmail.com> wrote: > On Mon, Dec 13, 2021 at 3:04 PM PGSQL DBA <pgsqldba.1...@gmail.com> wrote: > > As you mentioned in question-8, "I'd investigate whether data is being > cached unexpectedly, perhaps indicating that committed transactions be lost > in a system crash event." So, I would like to know that if we configure the > disk for the WALs with read+write disk cache then will it create any > performance issue and show the attached output? > > Which OS and file system are you running and what kind of storage > system is it? If you can turn write caching on and off in your > storage layer without affecting the ops/sec that's a pretty good clue > that it's respecting the cache control commands that the OS sends. > The numbers you showed are reassuringly low. Hmm, I wonder why you > have such a low number for non-sync'd writes (the last line). > > I have a concrete example of using this tool to learn something about > an unfamiliar-to-me operating system, from this morning: I saw a > system that reported ~6k IOPS for open_datasync and only ~600 IOPS for > fdatasync. With some limited knowledge of file systems I expect the > latter to involve sending a SYNCHRONIZE CACHE command to flush device > caches, and the former to do that, or use a more efficient FUA flag to > flush just particular writes and not the whole write cache. I didn't > expect it to go 10x faster, so something had to be wrong. With some > more investigation and reading I learned that the storage drivers I > was using do indeed ignore FUA cache control flags, so that > wal_sync_method is not crash safe; if you lose power and come back up, > you might lose a bunch of committed transactions from the end of the > WAL. I turned off write caches in the storage device, and then > open_datasync became a lot slower, matching my expectations. > > These are primitive tools, but can help to check some assumptions... > > Obligatory remark: PostgreSQL 9.5 is out of support, please see > https://www.postgresql.org/support/versioning/. > > > I also would like to know is there any best Practice from PostgreSQL > which mentions what is the disk latency required for the WAL & DATA disk? > > No particular latency is required by PostgreSQL, and I don't have a > general answer to this. Is it fast enough for you? What sort of > workload is it running, OLTP, reporting, ...? >