Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions

Aidar Imamov Sat, 05 Apr 2025 10:21:51 -0700

Joseph Koshakow 2025-03-21 01:25:

Hi I am working with Aidar to give a review and I am also a beginner
reviewer.

From 813e5ec0da4c65970b4b1ce2ec2918e4652da9ab Mon Sep 17 00:00:00

From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Date: Fri, 20 Dec 2024 14:06:47 +0300
Subject: [PATCH v1 1/2] Add pg_buffercache_evict_all() function for

testing

  */
 bool
-EvictUnpinnedBuffer(Buffer buf)
+EvictUnpinnedBuffer(Buffer buf, bool *flushed)


I think you should update the comment above this function to include
details on the new `flushed` argument.

diff --git a/doc/src/sgml/pgbuffercache.sgml

b/doc/src/sgml/pgbuffercache.sgml

index 802a5112d77..83950ca5cce 100644
--- a/doc/src/sgml/pgbuffercache.sgml
+++ b/doc/src/sgml/pgbuffercache.sgml
@@ -31,8 +31,10 @@
   This module provides the

<function>pg_buffercache_pages()</function>

   function (wrapped in the <structname>pg_buffercache</structname>

view),

   the <function>pg_buffercache_summary()</function> function, the
-  <function>pg_buffercache_usage_counts()</function> function and
-  the <function>pg_buffercache_evict()</function> function.
+  <function>pg_buffercache_usage_counts()</function> function, the
+  <function>pg_buffercache_evict()</function>, the
+  <function>pg_buffercache_evict_relation()</function>, and the
+  <function>pg_buffercache_evict_all()</function> function.


All the other functions have indexterm tags above this, should we add
indexterms for the new functions?

+ <sect2 id="pgbuffercache-pg-buffercache-evict-relation">
+  <title>The <structname>pg_buffercache_evict_relation</structname>

Function</title>

+  <para>
+   The <function>pg_buffercache_evict_relation()</function>

function is very similar

+   to <function>pg_buffercache_evict()</function> function.  The

difference is that

+   <function>pg_buffercache_evict_relation()</function> takes a

relation

+   identifier instead of buffer identifier.  Then, it tries to

evict all

+   buffers in that relation.  The function is intended for

developer testing only.

+  </para>
+ </sect2>
+
+ <sect2 id="pgbuffercache-pg-buffercache-evict-all">
+  <title>The <structname>pg_buffercache_evict_all</structname>

Function</title>

+  <para>
+   The <function>pg_buffercache_evict_all()</function> function is

very similar

+   to <function>pg_buffercache_evict()</function> function.  The

difference is,

+   the <function>pg_buffercache_evict_all()</function> does not

take argument;

+   instead it tries to evict all buffers in the buffer pool.  The

function is

+   intended for developer testing only.
+  </para>
+ </sect2>


The other difference is that these new functions have an additional
output argument to indicate if the buffer was flushed which we
probably
want to mention here.

Also I think it's more gramatically correct to say "the <fn-name>
function" or just "<fn-name>". A couple of times you say "<fn-name>
function" or "the <fn-name>".

Also I think the third to last sentence should end with "... does not
take **an** argument" or "... does not take argument**s**".

+CREATE FUNCTION pg_buffercache_evict_relation(
+    IN regclass,
+    IN fork text default 'main',
+    OUT buffers_evicted int4,
+    OUT buffers_flushed int4)
+AS 'MODULE_PATHNAME', 'pg_buffercache_evict_relation'
+LANGUAGE C PARALLEL SAFE VOLATILE;
+
+CREATE FUNCTION pg_buffercache_evict_all(
+    OUT buffers_evicted int4,
+    OUT buffers_flushed int4)
+AS 'MODULE_PATHNAME', 'pg_buffercache_evict_all'
+LANGUAGE C PARALLEL SAFE VOLATILE;


Does it make sense to also update pg_buffercache_evict to also return
a bool if the buffer was flushed? Or is that too much of a breaking
change?

+ /* Lock the header and check if it's valid. */
+ buf_state = LockBufHdr(desc);
+ if ((buf_state & BM_VALID) == 0)
+ {
+ UnlockBufHdr(desc, buf_state);
+ return false;
+ }


EvictUnpinnedBuffer first checks if the buffer is locked before
attempting to lock it. Do we need a similar check here?

   /* Lock the header if it is not already locked. */
   if (!buf_state)
    buf_state = LockBufHdr(desc);

I don't think *flushed is necessarily accurate this way, as it

might detect

that it doesn't need to flush the data (via StartBufferIO()

returning false).


You are right. It seems there is no way to get that information
without editing the FlushBuffer(), right? And I think editing
FlushBuffer() is not worth it. So, I can either remove it or explain
in the docs that #buffers_flushed may not be completely accurate.


There's already a lot of caveats on EvictUnpinnedBuffer that it might
have unexpected results when run concurrently, so I think it's fine to
add one more.

Thanks,
Joe Koshakow


I agree with most of what Joseph said. However, I would like to add some
comments.

At the moment, the "flushed" flag essentially indicates whether thebufferwas dirty at the time of eviction and it does not guarantee that it hasbeenwritten to disk. Therefore, it would be better to rename thebuffers_flushed

column in the output of pg_buffer_cache_evict_all() and

pg_buffercache_evict_relation() functions to dirty_buffers mb? Thiswouldallow us to avoid the confusion that arises from the fact that not alldirtybuffers could have actually been written to disk. In addition, thiswould

remove the "flushed" parameter from the EvictUnpinnedBuffer() function.

Because if we explicitly call LockBufHdr() outside ofEvictUnpinnedBuffer(),

we can already know in advance whether the buffer is dirty or not.

The same applies to the suggestion to retrieve "flushed" count from the
pg_buffercache_evict() call. We cannot say this for certain, but we can
determine whether the buffer was dirty.

regards,
Aidar Imamov

Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions

Reply via email to