Hi Thomas,

Apologies for uploading the wrong screenshot in the attachment. I have
uploaded the correct output of pg_test_fsync for your reference .

Please find our environment details as below:
Infra: Azure IaaS
OS: Red Hat Enterprise Linux Server release 7.9
File System: XFS
Disk: Premium SSD
IOPS: 5000 per Disk
Throughput: 200 MB/Sec per Disk
Disk Cache: Read+Write
Workload: OLTP

Actual issue is we are getting slow execution of Commit & Prepared Commit
statements only.  Execution of these statements is taking 2 to 14 seconds.
That's why we are analysing the pg_test_fsync output & FIO for further
troubleshooting but unable to find any right directions.



On Mon, 13 Dec 2021 at 10:45, Thomas Munro <thomas.mu...@gmail.com> wrote:

> On Mon, Dec 13, 2021 at 3:04 PM PGSQL DBA <pgsqldba.1...@gmail.com> wrote:
> > As you mentioned in question-8, "I'd investigate whether data is being
> cached unexpectedly, perhaps indicating that committed transactions be lost
> in a system crash event." So, I would like to know that if we configure the
> disk for the WALs with read+write disk cache then will it create any
> performance issue and show the attached output?
>
> Which OS and file system are you running and what kind of storage
> system is it?  If you can turn write caching on and off in your
> storage layer without affecting the ops/sec that's a pretty good clue
> that it's respecting the cache control commands that the OS sends.
> The numbers you showed are reassuringly low.  Hmm, I wonder why you
> have such a low number for non-sync'd writes (the last line).
>
> I have a concrete example of using this tool to learn something about
> an unfamiliar-to-me operating system, from this morning: I saw a
> system that reported ~6k IOPS for open_datasync and only ~600 IOPS for
> fdatasync.  With some limited knowledge of file systems I expect the
> latter to involve sending a SYNCHRONIZE CACHE command to flush device
> caches, and the former to do that, or use a more efficient FUA flag to
> flush just particular writes and not the whole write cache.  I didn't
> expect it to go 10x faster, so something had to be wrong.  With some
> more investigation and reading I learned that the storage drivers I
> was using do indeed ignore FUA cache control flags, so that
> wal_sync_method is not crash safe; if you lose power and come back up,
> you might lose a bunch of committed transactions from the end of the
> WAL.  I turned off write caches in the storage device, and then
> open_datasync became a lot slower, matching my expectations.
>
> These are primitive tools, but can help to check some assumptions...
>
> Obligatory remark: PostgreSQL 9.5 is out of support, please see
> https://www.postgresql.org/support/versioning/.
>
> > I also would like to know is there any best Practice from PostgreSQL
> which mentions what is the disk latency required for the WAL & DATA disk?
>
> No particular latency is required by PostgreSQL, and I don't have a
> general answer to this.  Is it fast enough for you?  What sort of
> workload is it running, OLTP, reporting, ...?
>

Reply via email to