On Mon, Oct 6, 2025 at 12:57:20PM -0400, Bruce Momjian wrote:
> On Mon, Oct 6, 2025 at 11:14:13AM -0400, Andres Freund wrote:
> > I'd guess that the *vast* majority of PG workloads these days run on
> > networked
> > block storage. For those typically the actual latency at the storage level
> > is
> > a rather small fraction of the overall IO latency, which is instead
> > dominated
> > by network and other related cost (like the indirection to which storage
> > system to go to and crossing VM/host boundaries). Because the majority of
> > the
> > IO latency is not affected by the storage latency, but by network lotency,
> > the
> > random IO/non-random IO difference will play less of a role.
>
> Yes, the last time we discussed changing the default random page cost,
> September 2024, the argument was that while SSDs should be < 4, cloud
> storage might be > 4, so 4 was still a good value:
>
>
> https://www.postgresql.org/message-id/flat/877caxaxt6.fsf%40wibble.ilmari.org#8a10b7b8cf05410291d076f8def58c29
>
> Add in cache effects for all of these storage devices as outlined in our
> docs.
I rewrote the random_page_cost docs, attached, to remove a focus on
magnetic disk, and added network latency as a reason for
random_page_cost being low. I removed the specific caching numbers and
went with a more generic description.
I would normally apply this only to master, but given the complaints in
this thread, maybe I should backpatch it.
--
Bruce Momjian <[email protected]> https://momjian.us
EDB https://enterprisedb.com
Do not let urgent matters crowd out time for investment in the future.
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
new file mode 100644
index 39e658b..49e0801
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*************** ANY <replaceable class="parameter">num_s
*** 5924,5947 ****
</para>
<para>
! Random access to mechanical disk storage is normally much more expensive
! than four times sequential access. However, a lower default is used
! (4.0) because the majority of random accesses to disk, such as indexed
! reads, are assumed to be in cache. The default value can be thought of
! as modeling random access as 40 times slower than sequential, while
! expecting 90% of random reads to be cached.
</para>
<para>
! If you believe a 90% cache rate is an incorrect assumption
! for your workload, you can increase random_page_cost to better
! reflect the true cost of random storage reads. Correspondingly,
! if your data is likely to be completely in cache, such as when
! the database is smaller than the total server memory, decreasing
! random_page_cost can be appropriate. Storage that has a low random
! read cost relative to sequential, e.g., solid-state drives, might
! also be better modeled with a lower value for random_page_cost,
! e.g., <literal>1.1</literal>.
</para>
<tip>
--- 5924,5947 ----
</para>
<para>
! Random access to durable storage is normally much more expensive
! than four times sequential access. However, a lower default is
! used (4.0) because the majority of random accesses to storage,
! such as indexed reads, are assumed to be in cache. Also, the
! latency of network-attached storage tends to reduce the relative
! overhead of random access.
</para>
<para>
! If you believe caching is less frequent than the default
! value reflects, and network latency is minimal, you can increase
! random_page_cost to better reflect the true cost of random storage
! reads. Storage that has a higher random read cost relative to
! sequential, like magnetic disks, might also be better modeled with
! a higher value for random_page_cost. Correspondingly, if your data
! is likely to be completely in cache, such as when the database
! is smaller than the total server memory, or network latency is
! high, decreasing random_page_cost might be appropriate.
</para>
<tip>