Re: [PATCH] Let's get rid of the freelist and the buffer_strategy_lock

Greg Burd Fri, 25 Jul 2025 12:03:13 -0700

On 7/22/25 14:43, Greg Burd wrote:
> On 7/21/25 14:35, Andres Freund wrote:
>> Hi,
>>
>> On 2025-07-21 13:37:04 -0400, Greg Burd wrote:
>>> On 7/18/25 13:03, Andres Freund wrote:
>>> Hello.  Thanks again for taking the time to review the email and patch,
>>> I think we're onto something good here.
>>>
>>>> I'd be curious if anybody wants to argue for keeping the clock sweep. 
>>>> Except
>>>> for the have_free_buffer() use in autoprewarm, it's a rather trivial
>>>> patch. And I really couldn't measure regressions above the noise level, 
>>>> even
>>>> if absurdly extreme use cases.
>>> Hmmm... was "argue for keeping the clock sweep" supposed to read "argue
>>> for keeping the freelist"?
>> Err, yes :(
> Phew.  :)  No worries.
>
>>>> On 2025-07-17 14:35:13 -0400, Greg Burd wrote:
>>>>> On Fri, Jul 11, 2025, at 2:52 PM, Andres Freund wrote:
>>>>>> I think we'll likely need something to replace it.
>>>>> Fair, this (v5) patch doesn't yet try to address this.
>>>>>
>>>>>> TBH, I'm not convinced that autoprewarm using have_free_buffer() is quite
>>>>>> right. The goal of the use have_free_buffer() is obviously to stop
>>>>>> prewarming
>>>>>> shared buffers if doing so would just evict buffers.  But it's not clear
>>>>>> to me
>>>>>> that we should just stop when there aren't any free buffers - what if the
>>>>>> previous buffer contents aren't the right ones?  It'd make more sense to
>>>>>> me to
>>>>>> stop autoprewarm once NBuffers have been prewarmed...
>>>>> I had the same high level reaction, that autoprewarm was leveraging
>>>>> something
>>>>> convenient but not necessarily required or even correct.  I'd considered
>>>>> using
>>>>> NBuffers as you describe due to similar intuitions, I'll dig into that 
>>>>> idea
>>>>> for
>>>>> the next revision after I get to know autoprewarm a bit better.
>>>> Cool. I do think that'll be good enough.
>>> I re-added the have_free_buffer() function only now it returns false
>>> once nextVictimBuffer > NBuffers signaling to autoprewarm that the clock
>>> has made its first complete pass.  With that I reverted my changes in
>>> the autoprewarm module.  The net should be the same behavior as before
>>> at startup when using that module.
>> I don't think we should have a have_free_buffer() that doesn't actually test
>> whether we have a free buffer, that seems too likely to cause
>> misunderstandings down the line.  What if we instead just limit the amount of
>> buffers we load in apw_load_buffers()? apw_load_buffers() knows NBuffers and
>> the number of to-be-loaded buffers, so that shouldn't be hard.
> I'm glad you said that, I wasn't thrilled with that either and I'm not
> sure why I didn't just correct for that in the last patch set.  I'm now
> capping num_elements to NBuffers at most.
>
>>>>> Meanwhile, the tests except for Windows pass [2] for this new patch [3].
>>>>> I'll dig into the Windows issues next week as well.
>>>> FWIW, there are backtraces generated on windows. E.g.
>>>>
>>>> https://api.cirrus-ci.com/v1/artifact/task/6327899394932736/crashlog/crashlog-postgres.exe_00c0_2025-07-17_19-19-00-008.txt
>>>>
>>>> 000000cd`827fdea0 00007ff7`6ad82f88     ucrtbased!abort(void)+0x5a 
>>>> [minkernel\crts\ucrt\src\appcrt\startup\abort.cpp @ 77]
>>>> 000000cd`827fdee0 00007ff7`6aae2b7c     postgres!ExceptionalCondition(
>>>>                    char * conditionName = 0x00007ff7`6b2a4cb8 "result < 
>>>> NBuffers",
>>>>                    char * fileName = 0x00007ff7`6b2a4c88 
>>>> "../src/backend/storage/buffer/freelist.c",
>>>>                    int lineNumber = 0n139)+0x78 
>>>> [c:\cirrus\src\backend\utils\error\assert.c @ 67]
>>>> 000000cd`827fdf20 00007ff7`6aae272c     postgres!clock_modulo(
>>>>                    unsigned int64 counter = 0x101)+0x6c 
>>>> [c:\cirrus\src\backend\storage\buffer\freelist.c @ 139]
>>>> 000000cd`827fdf60 00007ff7`6aad8647     postgres!StrategySyncStart(
>>>>                    unsigned int * complete_passes = 0x000000cd`827fdfc0,
>>>>                    unsigned int * num_buf_alloc = 
>>>> 0x000000cd`827fdfcc)+0x2c [c:\cirrus\src\backend\storage\buffer\freelist.c 
>>>> @ 300]
>>>> 000000cd`827fdfa0 00007ff7`6aa254a3     postgres!BgBufferSync(
>>>>                    struct WritebackContext * wb_context = 
>>>> 0x000000cd`827fe180)+0x37 [c:\cirrus\src\backend\storage\buffer\bufmgr.c @ 
>>>> 3649]
>>>> 000000cd`827fe030 00007ff7`6aa278a7     postgres!BackgroundWriterMain(
>>>>                    void * startup_data = 0x00000000`00000000,
>>>>                    unsigned int64 startup_data_len = 0)+0x243 
>>>> [c:\cirrus\src\backend\postmaster\bgwriter.c @ 236]
>>>> 000000cd`827ff5a0 00007ff7`6a8daf19     postgres!SubPostmasterMain(
>>>>                    int argc = 0n3,
>>>>                    char ** argv = 0x0000028f`e75d24d0)+0x2f7 
>>>> [c:\cirrus\src\backend\postmaster\launch_backend.c @ 714]
>>>> 000000cd`827ff620 00007ff7`6af0f5a9     postgres!main(
>>>>                    int argc = 0n3,
>>>>                    char ** argv = 0x0000028f`e75d24d0)+0x329 
>>>> [c:\cirrus\src\backend\main\main.c @ 222]
>>>>
>>>> I.e. your new assertion failed for some reason that i can't *immediately* 
>>>> see.
>>> I put that in as a precaution and as a way to communicate the intention
>>> of the other code above it.  I never imagined it would assert.  I've
>>> changed clock_read() to only assert when the modulo differs and left
>>> that assert in the calling ClockSweepTick() function because it was
>>> redundant and I'm curious to see if we see a similar assert when testing
>>> the modulo.
>> Do you understand why it triggered? Because I don't immediately. The fact 
>> that
>> it triggered only on windows, where the compiler is rather different, makes 
>> it
>> worth understanding imo.
> I dug into the ASM for both GCC 15.1 and MSVC 19.latest (thanks
> godbolt.org!) for x86_64 and there was a critical difference.  It starts
> with the fact that I'd used uint32 for my NBuffersPow2Mask rather than
> uint64.  That then translates to two different compiled outputs for
> clock_read() (was: clock_modulo()).
>
> gcc-15.1 -O2
>
> clock_read(unsigned long long): and edi, DWORD PTR NBuffersPow2Mask[rip]
> mov edx, DWORD PTR NBuffers[rip] mov rax, rdi sub rax, rdx cmp rdi, rdx
> cmovb rax, rdi ret
>
> msvc-19.latest /O2
>
> hand$ = 8 unsigned int clock_read(unsigned int64) PROC ; clock_read,
> COMDAT mov edx, ecx and rdx, QWORD PTR unsigned int64 NBuffersPow2Mask ;
> NBuffersPow2Mask mov ecx, DWORD PTR unsigned int NBuffers ; NBuffers mov
> eax, edx sub eax, ecx cmp rdx, rcx cmovb eax, edx ret 0 unsigned int
> clock_read(unsigned __int64) ENDP ; clock_read
>
>
> Here's what I think was happening, the MSVC compiler produced assembly
> for "hand & NBuffersPow2Mask" that uses "rdx QWORD" while GCC uses "edi
> DWORD".  The 32-bit AND operation (edi) automatically zeros the upper 32
> bits of rdi after performing the and with the uint64 value of hand while
> "rdx QWORD" does not potentially leaving some of the upper 32 bits set. 
> My guess is that on Windows when the value of the clock hand exceeded
> UINT32_MAX (as can happen in as little as 3 seconds in a tight loop but
> likely took longer in the test run) the bits not masked out would
> inflate the resulting value which would be > NBufers and also differ
> from the simple modulo calculation causing the failed assertion.
>
> Changing the value of NBuffersPow2Mask to uint64 and using similarly
> sized types in these functions more closely aligns the assembly code and
> should fix this.


And it did make the code more correct, it just didn't fix that
particular bug.  The bug was in the logic for my optimized modulo.  Big
thank you to my PostgreSQL committer mentor Noah Misch who asked a
simple question yesterday in our monthly call, "if this algorithm is
correct, why isn't it how CPUs and/or compilers implement modulo?"  Then
went on to suggest that I write an exhaustive test for it (maybe even
coerce an LLM to do that for me) and test it.  So I did, and it was wrong.

I felt I'd give it one more try.  It turns out there is an algorithm [1]
based on some work that went into GMP a while ago [2], so I implemented
it and gave a try (attached test.c).  Without optimization it's 10-42%
faster, but when compiled with GCC -O3 or -Ofast that advantage goes
away and it is slightly slower.  So, simplicity for the win and let the
compiler do the hard parts I guess.

>>>>> @@ -3664,15 +3664,25 @@ BgBufferSync(WritebackContext *wb_context)
>>>>>
>>>>>   /*
>>>>>    * Compute strategy_delta = how many buffers have been scanned by the
>>>>> -  * clock sweep since last time.  If first time through, assume none. 
>>>>> Then
>>>>> -  * see if we are still ahead of the clock sweep, and if so, how many
>>>>> +  * clock-sweep since last time.  If first time through, assume none. 
>>>>> Then
>>>>> +  * see if we are still ahead of the clock-sweep, and if so, how many
>>>>>    * buffers we could scan before we'd catch up with it and "lap" it. 
>>>>> Note:
>>>>>    * weird-looking coding of xxx_passes comparisons are to avoid bogus
>>>>>    * behavior when the passes counts wrap around.
>>>>>    */
>>>>>   if (saved_info_valid)
>>>>>   {
>>>>> -         int32           passes_delta = strategy_passes - 
>>>>> prev_strategy_passes;
>>>>> +         int32           passes_delta;
>>>>> +
>>>>> +         if (unlikely(prev_strategy_passes > strategy_passes))
>>>>> +         {
>>>>> +                 /* wrap-around case */
>>>>> +                 passes_delta = (int32) (UINT32_MAX - 
>>>>> prev_strategy_passes + strategy_passes);
>>>>> +         }
>>>>> +         else
>>>>> +         {
>>>>> +                 passes_delta = (int32) (strategy_passes - 
>>>>> prev_strategy_passes);
>>>>> +         }
>>>>>
>>>>>           strategy_delta = strategy_buf_id - prev_strategy_buf_id;
>>>>>           strategy_delta += (long) passes_delta * NBuffers;
>>>> That seems somewhat independent of the rest of the change, or am I missing 
>>>> something?
>>> That change is there to cover the possibility of someone managing to
>>> overflow and wrap a uint64 which is *highly* unlikely.
>> That risk existed previously too - I'm not against shoring things up, I'd 
>> just
>> do it in a precursor commit, to make this easier to review.
>>
>>> If this degree of paranoia isn't required I'm happy to remove it.
>> That does indeed seem really unlikely.  Assuming that postgres stays up for 
>> 10
>> years without a single restart, it'd be ~59 billion ticks a second.
> Agreed, it's overkill.
>> I don't mind a defense, but I think we'd be better off putting it into
>> ClockSweepTick() or such, simply erroring out if we ever hit this. It's
>> unlikely that we'd get (and keep) all the relevant untested code correct ime.
>> Then we also can assert that prev_strategy_passes <= strategy_passes.
> Added assertions and comments to explain decision.
>> Greetings,
>>
>> Andres Freund

Patch set is now:

1) remove freelist

2) remove buffer_strategy_lock

3) abstract clock-sweep to type and API


Rebased v10 onto 5457ea46d18 at GitHub[3] and CommitFest[4],


-greg


[1] https://stackoverflow.com/a/26047426/366692

[2] https://gmplib.org/~tege/divcnst-pldi94.pdf

[3] https://github.com/gburd/postgres/pull/10

[4] https://commitfest.postgresql.org/patch/5928/

// CFLAGS="-O3" make test
//
// PostgreSQL's max value for shared_buffers is 1073741823,
// min is 16, default is 16384.
//
// ./test 16 31 64 100 128 512 1023 1024 16384 1073741823

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <time.h>
#include <pthread.h>
#include <assert.h>

// Thread-safe atomic operations using GCC built-ins
typedef struct {
    volatile uint64_t value;
} pg_atomic_uint64;

static inline void
pg_atomic_init_u64(pg_atomic_uint64 *ptr, uint64_t val) {
    ptr->value = val;
}

static inline uint64_t
pg_atomic_read_u64(pg_atomic_uint64 *ptr) {
    return __atomic_load_n(&ptr->value, __ATOMIC_SEQ_CST);
}

static inline uint64_t
pg_atomic_fetch_add_u64(pg_atomic_uint64 *ptr, uint64_t add_) {
    return __atomic_fetch_add(&ptr->value, add_, __ATOMIC_SEQ_CST);
}

// ClockSweep structure with magic constant support
typedef struct {
    pg_atomic_uint64 counter;
    uint32_t size;
    uint32_t mask;           // For power-of-2 sizes
    uint64_t magic;          // Magic constant for non-power-of-2
    int use_mask;            // 1 for power-of-2, 0 for magic modulo
} ClockSweep;

// Fast modulo using magic constant
static inline uint32_t
fast_mod(uint32_t n, uint32_t divisor, uint64_t magic) {
    // Compute quotient using magic multiplication
    uint32_t quotient = (uint32_t)(((uint64_t)n * magic) >> 32);

    // Compute remainder
    uint32_t remainder = n - quotient * divisor;

    // Adjust if remainder is too large (can only be off by divisor)
    return remainder < divisor ? remainder : remainder - divisor;
}

// Initialize ClockSweep with magic constant calculation
static void
ClockSweepInit(ClockSweep *sweep, uint32_t size) {
    pg_atomic_init_u64(&sweep->counter, 0);
    sweep->size = size;

    // Check if size is power of 2
    if ((size & (size - 1)) == 0) {
	// Power of 2: use simple mask
	sweep->mask = size - 1;
	sweep->magic = 0;  // Unused
	sweep->use_mask = 1;
    } else {
	// Non-power of 2: calculate magic constant
	sweep->mask = 0;   // Unused
	sweep->magic = ((1ULL << 32) + size - 1) / size;  // Ceiling division
	sweep->use_mask = 0;
    }
}

// Get current position without advancing
static inline uint32_t
ClockSweepPosition(ClockSweep *sweep) {
    uint64_t current = pg_atomic_read_u64(&sweep->counter);
    uint32_t counter32 = (uint32_t)current;

    if (sweep->use_mask) {
	// Power of 2: use mask
	return counter32 & sweep->mask;
    } else {
	// Non-power of 2: use magic modulo
	return fast_mod(counter32, sweep->size, sweep->magic);
    }
}

// Advance counter and return new position
static inline uint32_t
ClockSweepTick(ClockSweep *sweep) {
    uint64_t current = pg_atomic_fetch_add_u64(&sweep->counter, 1);
    uint32_t counter = (uint32_t)current;

    if (sweep->use_mask) {
	// Power of 2: use mask
	return counter & sweep->mask;
    } else {
	// Non-power of 2: use magic modulo
	return fast_mod(counter, sweep->size, sweep->magic);
    }
}

// Get number of complete cycles
static inline uint64_t
ClockSweepCycles(ClockSweep *sweep) {
    uint64_t current = pg_atomic_read_u64(&sweep->counter);
    return current / sweep->size;
}

// Baseline implementation using modulo operator for comparison
typedef struct {
    pg_atomic_uint64 counter;
    uint32_t size;
} ClockSweepBaseline;

static void
ClockSweepBaselineInit(ClockSweepBaseline *sweep, uint32_t size) {
    pg_atomic_init_u64(&sweep->counter, 0);
    sweep->size = size;
}

static inline uint32_t
ClockSweepBaselineTick(ClockSweepBaseline *sweep) {
    uint64_t current = pg_atomic_fetch_add_u64(&sweep->counter, 1);
    uint32_t counter = (uint32_t)current;

    return counter % sweep->size;  // Standard modulo
}

// Test data structure for multi-threading
typedef struct {
    ClockSweep *sweep;
    int thread_id;
    int iterations;
    uint32_t *results;
    int result_count;
} ThreadData;

// Thread function for multi-threaded testing
static void *
thread_test_function(void *arg) {
    ThreadData *data = (ThreadData *)arg;

    for (int i = 0; i < data->iterations; i++) {
	uint32_t pos = ClockSweepTick(data->sweep);
	if (i < data->result_count) {
	    data->results[i] = pos;
	}
    }

    return NULL;
}

// Verify magic constant works correctly
static void
verify_magic_mod(uint32_t divisor, uint64_t magic) {
    printf("Testing divisor %u with magic %llu:\n", divisor, magic);

    // Test first few multiples of divisor
    for (uint32_t i = 0; i < divisor * 3; i++) {
	uint32_t expected = i % divisor;
	uint32_t actual = fast_mod(i, divisor, magic);

	if (actual != expected) {
	    printf("FAIL: %u %% %u = %u, got %u\n", i, divisor, expected, actual);
	    return;
	}
    }

    // Test some larger values
    uint32_t test_values[] = {1000, 10000, 100000, 1000000, 0xFFFFFFFF};
    for (int i = 0; i < 5; i++) {
	uint32_t n = test_values[i];
	uint32_t expected = n % divisor;
	uint32_t actual = fast_mod(n, divisor, magic);

	if (actual != expected) {
	    printf("FAIL: %u %% %u = %u, got %u\n", n, divisor, expected, actual);
	    return;
	}
    }

    printf("PASS: All tests passed for divisor %u\n", divisor);
}

// Test basic functionality
static void
test_basic_functionality(uint32_t *test_sizes, int num_sizes) {
    printf("=== Basic Functionality Test ===\n");

    for (int i = 0; i < num_sizes; i++) {
	uint32_t size = test_sizes[i];
	ClockSweep sweep;
	ClockSweepInit(&sweep, size);

	printf("Testing size %u (use_mask=%d, magic=%llu):\n",
	       size, sweep.use_mask, sweep.magic);

	// Verify magic constant if not using mask
	if (!sweep.use_mask) {
	    verify_magic_mod(size, sweep.magic);
	}

	// Test sequence of positions
	for (uint32_t j = 0; j < size * 2; j++) {
	    uint32_t pos = ClockSweepTick(&sweep);
	    uint32_t expected = (j + 1) % size;

	    if (pos != expected) {
		printf("FAIL: At iteration %u, expected %u, got %u\n", j, expected, pos);
		return;
	    }
	}

	// Test cycles
	uint64_t cycles = ClockSweepCycles(&sweep);
	if (cycles != 2) {
	    printf("FAIL: Expected 2 cycles, got %llu\n", cycles);
	    return;
	}

	printf("PASS: Size %u\n", size);
    }

    printf("=== Basic Functionality Test PASSED ===\n\n");
}

// Test edge cases and large counter values
static void
test_edge_cases(void) {
    printf("=== Edge Cases Test ===\n");

    ClockSweep sweep;
    ClockSweepInit(&sweep, 1023);

    // Test with large counter values
    uint64_t large_values[] = {
	0xFFFFFFFF,
	0x100000000ULL,
	0x123456789ABCDEFULL
    };

    for (int i = 0; i < 3; i++) {
	pg_atomic_init_u64(&sweep.counter, large_values[i]);

	uint32_t pos = ClockSweepPosition(&sweep);
	uint32_t expected = (uint32_t)large_values[i] % 1023;

	if (pos != expected) {
	    printf("FAIL: For counter %llu, expected %u, got %u\n",
		   large_values[i], expected, pos);
	    return;
	}

	printf("PASS: Large counter %llu -> position %u\n", large_values[i], pos);
    }

    printf("=== Edge Cases Test PASSED ===\n\n");
}

// Test multi-threaded access
static void
test_multithreaded(void) {
    printf("=== Multi-threaded Test ===\n");

    const int num_threads = 4;
    const int iterations_per_thread = 10000;
    const uint32_t size = 16384;  // Default size

    ClockSweep sweep;
    ClockSweepInit(&sweep, size);

    pthread_t threads[num_threads];
    ThreadData thread_data[num_threads];

    // Create threads
    for (int i = 0; i < num_threads; i++) {
	thread_data[i].sweep = &sweep;
	thread_data[i].thread_id = i;
	thread_data[i].iterations = iterations_per_thread;
	thread_data[i].results = malloc(1000 * sizeof(uint32_t));
	thread_data[i].result_count = 1000;

	pthread_create(&threads[i], NULL, thread_test_function, &thread_data[i]);
    }

    // Wait for threads to complete
    for (int i = 0; i < num_threads; i++) {
	pthread_join(threads[i], NULL);
    }

    // Verify final counter value
    uint64_t final_counter = pg_atomic_read_u64(&sweep.counter);
    uint64_t expected_counter = num_threads * iterations_per_thread;

    if (final_counter != expected_counter) {
	printf("FAIL: Expected final counter %llu, got %llu\n",
	       expected_counter, final_counter);

	// Clean up and return
	for (int i = 0; i < num_threads; i++) {
	    free(thread_data[i].results);
	}
	return;
    }

    // Verify all positions are valid
    for (int i = 0; i < num_threads; i++) {
	for (int j = 0; j < thread_data[i].result_count; j++) {
	    uint32_t pos = thread_data[i].results[j];
	    if (pos >= size) {
		printf("FAIL: Thread %d got invalid position %u (size=%u)\n",
		       i, pos, size);

		// Clean up and return
		for (int k = 0; k < num_threads; k++) {
		    free(thread_data[k].results);
		}
		return;
	    }
	}
	free(thread_data[i].results);
    }

    printf("PASS: Multi-threaded test with %d threads, %d iterations each\n",
	   num_threads, iterations_per_thread);
    printf("Final counter: %llu\n", final_counter);
    printf("=== Multi-threaded Test PASSED ===\n\n");
}

// Performance benchmark with modulo comparison
static void
benchmark_performance(uint32_t *test_sizes, int num_sizes) {
    const int iterations = 10000000;

    printf("=== Performance Benchmark ===\n");
    printf("Comparing optimized ClockSweep vs standard modulo operator\n\n");

    printf("Size     Method      Ops/sec (M)   Speedup\n");
    printf("----     ------      -----------   -------\n");

    for (int i = 0; i < num_sizes; i++) {
	uint32_t size = test_sizes[i];

	// Test optimized version
	ClockSweep sweep;
	ClockSweepInit(&sweep, size);

	clock_t start = clock();
	for (int j = 0; j < iterations; j++) {
	    ClockSweepTick(&sweep);
	}
	clock_t end = clock();

	double elapsed_opt = ((double)(end - start)) / CLOCKS_PER_SEC;
	double ops_per_sec_opt = iterations / elapsed_opt;

	// Test baseline modulo version
	ClockSweepBaseline baseline;
	ClockSweepBaselineInit(&baseline, size);

	start = clock();
	for (int j = 0; j < iterations; j++) {
	    ClockSweepBaselineTick(&baseline);
	}
	end = clock();

	double elapsed_mod = ((double)(end - start)) / CLOCKS_PER_SEC;
	double ops_per_sec_mod = iterations / elapsed_mod;

	double speedup = ops_per_sec_opt / ops_per_sec_mod;

	printf("%4u     %-6s      %8.2f      %5.2fx\n",
	       size,
	       sweep.use_mask ? "mask" : "magic",
	       ops_per_sec_opt / 1000000.0,
	       speedup);

	printf("%4u     modulo      %8.2f      %5.2fx\n",
	       size,
	       ops_per_sec_mod / 1000000.0,
	       1.0);

	printf("\n");
    }

    printf("=== Performance Benchmark COMPLETED ===\n\n");
}

// Main test function
int main(int argc, char *argv[]) {
    clock_t start_time = clock();

    uint32_t *sizes = malloc(sizeof(uint32_t) * argc);
    for (int i = 1; i < argc; i++)
      sizes[i - 1] = atoi(argv[i]);

    test_basic_functionality(sizes, argc - 1);
    test_edge_cases();
    test_multithreaded();
    benchmark_performance(sizes, argc - 1);

    clock_t end_time = clock();
    double total_time = ((double)(end_time - start_time)) / CLOCKS_PER_SEC;

    printf("All tests completed successfully!\n");
    printf("Total execution time: %.2f seconds\n", total_time);

    return 0;
}

From d73d862c7feaa346bf0c67d03396faae81fba717 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Thu, 10 Jul 2025 14:45:32 -0400
Subject: [PATCH v10 1/3] Eliminate the freelist from the buffer manager and
 depend on clock-sweep

This set of changes removes the list of available buffers and instead
simply uses the clock-sweep algorithm to find and return an available
buffer.  While on the surface this appears to be removing an
optimization it is in fact eliminating code that induces overhead in the
form of synchronization that is problemmatic for multi-core systems.
This also removes the have_free_buffer() function and simply caps the
pg_autoprewarm process to at most NBuffers.
---
 contrib/pg_prewarm/autoprewarm.c      |  31 ++++---
 src/backend/storage/buffer/README     |  42 +++------
 src/backend/storage/buffer/buf_init.c |   9 --
 src/backend/storage/buffer/bufmgr.c   |  29 +------
 src/backend/storage/buffer/freelist.c | 120 +++-----------------------
 src/include/storage/buf_internals.h   |  12 +--
 6 files changed, 43 insertions(+), 200 deletions(-)

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index c01b9c7e6a4..2722b0bb443 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -370,6 +370,16 @@ apw_load_buffers(void)
 	apw_state->prewarm_start_idx = apw_state->prewarm_stop_idx = 0;
 	apw_state->prewarmed_blocks = 0;
 
+
+	/* Don't prewarm more than we can fit. */
+	if (num_elements > NBuffers)
+	{
+		num_elements = NBuffers;
+		ereport(LOG,
+				(errmsg("autoprewarm: capping prewarmed blocks to %d (shared_buffers size)",
+						NBuffers)));
+	}
+
 	/* Get the info position of the first block of the next database. */
 	while (apw_state->prewarm_start_idx < num_elements)
 	{
@@ -410,10 +420,6 @@ apw_load_buffers(void)
 		apw_state->database = current_db;
 		Assert(apw_state->prewarm_start_idx < apw_state->prewarm_stop_idx);
 
-		/* If we've run out of free buffers, don't launch another worker. */
-		if (!have_free_buffer())
-			break;
-
 		/*
 		 * Likewise, don't launch if we've already been told to shut down.
 		 * (The launch would fail anyway, but we might as well skip it.)
@@ -462,12 +468,6 @@ apw_read_stream_next_block(ReadStream *stream,
 	{
 		BlockInfoRecord blk = p->block_info[p->pos];
 
-		if (!have_free_buffer())
-		{
-			p->pos = apw_state->prewarm_stop_idx;
-			return InvalidBlockNumber;
-		}
-
 		if (blk.tablespace != p->tablespace)
 			return InvalidBlockNumber;
 
@@ -523,10 +523,10 @@ autoprewarm_database_main(Datum main_arg)
 	blk = block_info[i];
 
 	/*
-	 * Loop until we run out of blocks to prewarm or until we run out of free
+	 * Loop until we run out of blocks to prewarm or until we run out of
 	 * buffers.
 	 */
-	while (i < apw_state->prewarm_stop_idx && have_free_buffer())
+	while (i < apw_state->prewarm_stop_idx)
 	{
 		Oid			tablespace = blk.tablespace;
 		RelFileNumber filenumber = blk.filenumber;
@@ -568,14 +568,13 @@ autoprewarm_database_main(Datum main_arg)
 
 		/*
 		 * We have a relation; now let's loop until we find a valid fork of
-		 * the relation or we run out of free buffers. Once we've read from
-		 * all valid forks or run out of options, we'll close the relation and
+		 * the relation or we run out of buffers. Once we've read from all
+		 * valid forks or run out of options, we'll close the relation and
 		 * move on.
 		 */
 		while (i < apw_state->prewarm_stop_idx &&
 			   blk.tablespace == tablespace &&
-			   blk.filenumber == filenumber &&
-			   have_free_buffer())
+			   blk.filenumber == filenumber)
 		{
 			ForkNumber	forknum = blk.forknum;
 			BlockNumber nblocks;
diff --git a/src/backend/storage/buffer/README b/src/backend/storage/buffer/README
index a182fcd660c..cd52effd911 100644
--- a/src/backend/storage/buffer/README
+++ b/src/backend/storage/buffer/README
@@ -128,11 +128,11 @@ independently.  If it is necessary to lock more than one partition at a time,
 they must be locked in partition-number order to avoid risk of deadlock.
 
 * A separate system-wide spinlock, buffer_strategy_lock, provides mutual
-exclusion for operations that access the buffer free list or select
-buffers for replacement.  A spinlock is used here rather than a lightweight
-lock for efficiency; no other locks of any sort should be acquired while
-buffer_strategy_lock is held.  This is essential to allow buffer replacement
-to happen in multiple backends with reasonable concurrency.
+exclusion for operations that select buffers for replacement.  A spinlock is
+used here rather than a lightweight lock for efficiency; no other locks of any
+sort should be acquired while buffer_strategy_lock is held.  This is essential
+to allow buffer replacement to happen in multiple backends with reasonable
+concurrency.
 
 * Each buffer header contains a spinlock that must be taken when examining
 or changing fields of that buffer header.  This allows operations such as
@@ -158,18 +158,9 @@ unset by sleeping on the buffer's condition variable.
 Normal Buffer Replacement Strategy
 ----------------------------------
 
-There is a "free list" of buffers that are prime candidates for replacement.
-In particular, buffers that are completely free (contain no valid page) are
-always in this list.  We could also throw buffers into this list if we
-consider their pages unlikely to be needed soon; however, the current
-algorithm never does that.  The list is singly-linked using fields in the
-buffer headers; we maintain head and tail pointers in global variables.
-(Note: although the list links are in the buffer headers, they are
-considered to be protected by the buffer_strategy_lock, not the buffer-header
-spinlocks.)  To choose a victim buffer to recycle when there are no free
-buffers available, we use a simple clock-sweep algorithm, which avoids the
-need to take system-wide locks during common operations.  It works like
-this:
+To choose a victim buffer to recycle when there are no free buffers available,
+we use a simple clock-sweep algorithm, which avoids the need to take
+system-wide locks during common operations.  It works like this:
 
 Each buffer header contains a usage counter, which is incremented (up to a
 small limit value) whenever the buffer is pinned.  (This requires only the
@@ -184,20 +175,14 @@ The algorithm for a process that needs to obtain a victim buffer is:
 
 1. Obtain buffer_strategy_lock.
 
-2. If buffer free list is nonempty, remove its head buffer.  Release
-buffer_strategy_lock.  If the buffer is pinned or has a nonzero usage count,
-it cannot be used; ignore it go back to step 1.  Otherwise, pin the buffer,
-and return it.
+2. Select the buffer pointed to by nextVictimBuffer, and circularly advance
+nextVictimBuffer for next time. Release buffer_strategy_lock.
 
-3. Otherwise, the buffer free list is empty.  Select the buffer pointed to by
-nextVictimBuffer, and circularly advance nextVictimBuffer for next time.
-Release buffer_strategy_lock.
-
-4. If the selected buffer is pinned or has a nonzero usage count, it cannot
+3. If the selected buffer is pinned or has a nonzero usage count, it cannot
 be used.  Decrement its usage count (if nonzero), reacquire
 buffer_strategy_lock, and return to step 3 to examine the next buffer.
 
-5. Pin the selected buffer, and return.
+4. Pin the selected buffer, and return.
 
 (Note that if the selected buffer is dirty, we will have to write it out
 before we can recycle it; if someone else pins the buffer meanwhile we will
@@ -234,7 +219,7 @@ the ring strategy effectively degrades to the normal strategy.
 
 VACUUM uses a ring like sequential scans, however, the size of this ring is
 controlled by the vacuum_buffer_usage_limit GUC.  Dirty pages are not removed
-from the ring.  Instead, WAL is flushed if needed to allow reuse of the
+from the ring.  Instead, the WAL is flushed if needed to allow reuse of the
 buffers.  Before introducing the buffer ring strategy in 8.3, VACUUM's buffers
 were sent to the freelist, which was effectively a buffer ring of 1 buffer,
 resulting in excessive WAL flushing.
@@ -277,3 +262,4 @@ As of 8.4, background writer starts during recovery mode when there is
 some form of potentially extended recovery to perform. It performs an
 identical service to normal processing, except that checkpoints it
 writes are technically restartpoints.
+
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index ed1dc488a42..6fd3a6bbac5 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -128,20 +128,11 @@ BufferManagerShmemInit(void)
 
 			pgaio_wref_clear(&buf->io_wref);
 
-			/*
-			 * Initially link all the buffers together as unused. Subsequent
-			 * management of this list is done by freelist.c.
-			 */
-			buf->freeNext = i + 1;
-
 			LWLockInitialize(BufferDescriptorGetContentLock(buf),
 							 LWTRANCHE_BUFFER_CONTENT);
 
 			ConditionVariableInit(BufferDescriptorGetIOCV(buf));
 		}
-
-		/* Correct last entry of linked list */
-		GetBufferDescriptor(NBuffers - 1)->freeNext = FREENEXT_END_OF_LIST;
 	}
 
 	/* Init other shared buffer-management stuff */
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 6afdd28dba6..af5ef025229 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2099,12 +2099,6 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 		 */
 		UnpinBuffer(victim_buf_hdr);
 
-		/*
-		 * The victim buffer we acquired previously is clean and unused, let
-		 * it be found again quickly
-		 */
-		StrategyFreeBuffer(victim_buf_hdr);
-
 		/* remaining code should match code at top of routine */
 
 		existing_buf_hdr = GetBufferDescriptor(existing_buf_id);
@@ -2163,8 +2157,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 }
 
 /*
- * InvalidateBuffer -- mark a shared buffer invalid and return it to the
- * freelist.
+ * InvalidateBuffer -- mark a shared buffer invalid.
  *
  * The buffer header spinlock must be held at entry.  We drop it before
  * returning.  (This is sane because the caller must have locked the
@@ -2262,11 +2255,6 @@ retry:
 	 * Done with mapping lock.
 	 */
 	LWLockRelease(oldPartitionLock);
-
-	/*
-	 * Insert the buffer at the head of the list of free buffers.
-	 */
-	StrategyFreeBuffer(buf);
 }
 
 /*
@@ -2684,11 +2672,6 @@ ExtendBufferedRelShared(BufferManagerRelation bmr,
 		{
 			BufferDesc *buf_hdr = GetBufferDescriptor(buffers[i] - 1);
 
-			/*
-			 * The victim buffer we acquired previously is clean and unused,
-			 * let it be found again quickly
-			 */
-			StrategyFreeBuffer(buf_hdr);
 			UnpinBuffer(buf_hdr);
 		}
 
@@ -2763,12 +2746,6 @@ ExtendBufferedRelShared(BufferManagerRelation bmr,
 			valid = PinBuffer(existing_hdr, strategy);
 
 			LWLockRelease(partition_lock);
-
-			/*
-			 * The victim buffer we acquired previously is clean and unused,
-			 * let it be found again quickly
-			 */
-			StrategyFreeBuffer(victim_buf_hdr);
 			UnpinBuffer(victim_buf_hdr);
 
 			buffers[i] = BufferDescriptorGetBuffer(existing_hdr);
@@ -3666,8 +3643,8 @@ BgBufferSync(WritebackContext *wb_context)
 	uint32		new_recent_alloc;
 
 	/*
-	 * Find out where the freelist clock sweep currently is, and how many
-	 * buffer allocations have happened since our last call.
+	 * Find out where the clock sweep currently is, and how many buffer
+	 * allocations have happened since our last call.
 	 */
 	strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc);
 
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 01909be0272..162c140fb9d 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -39,14 +39,6 @@ typedef struct
 	 */
 	pg_atomic_uint32 nextVictimBuffer;
 
-	int			firstFreeBuffer;	/* Head of list of unused buffers */
-	int			lastFreeBuffer; /* Tail of list of unused buffers */
-
-	/*
-	 * NOTE: lastFreeBuffer is undefined when firstFreeBuffer is -1 (that is,
-	 * when the list is empty)
-	 */
-
 	/*
 	 * Statistics.  These counters should be wide enough that they can't
 	 * overflow during a single bgwriter cycle.
@@ -164,17 +156,16 @@ ClockSweepTick(void)
 }
 
 /*
- * have_free_buffer -- a lockless check to see if there is a free buffer in
- *					   buffer pool.
+ * have_free_buffer -- check if we've filled the buffer pool at startup
  *
- * If the result is true that will become stale once free buffers are moved out
- * by other operations, so the caller who strictly want to use a free buffer
- * should not call this.
+ * Used exclusively by autoprewarm.
  */
 bool
 have_free_buffer(void)
 {
-	if (StrategyControl->firstFreeBuffer >= 0)
+	uint64		hand = pg_atomic_read_u64(&StrategyControl->nextVictimBuffer);
+
+	if (hand < NBuffers)
 		return true;
 	else
 		return false;
@@ -243,75 +234,14 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state, bool *from_r
 	}
 
 	/*
-	 * We count buffer allocation requests so that the bgwriter can estimate
-	 * the rate of buffer consumption.  Note that buffers recycled by a
-	 * strategy object are intentionally not counted here.
+	 * We keep an approximate count of buffer allocation requests so that the
+	 * bgwriter can estimate the rate of buffer consumption.  Note that
+	 * buffers recycled by a strategy object are intentionally not counted
+	 * here.
 	 */
 	pg_atomic_fetch_add_u32(&StrategyControl->numBufferAllocs, 1);
 
-	/*
-	 * First check, without acquiring the lock, whether there's buffers in the
-	 * freelist. Since we otherwise don't require the spinlock in every
-	 * StrategyGetBuffer() invocation, it'd be sad to acquire it here -
-	 * uselessly in most cases. That obviously leaves a race where a buffer is
-	 * put on the freelist but we don't see the store yet - but that's pretty
-	 * harmless, it'll just get used during the next buffer acquisition.
-	 *
-	 * If there's buffers on the freelist, acquire the spinlock to pop one
-	 * buffer of the freelist. Then check whether that buffer is usable and
-	 * repeat if not.
-	 *
-	 * Note that the freeNext fields are considered to be protected by the
-	 * buffer_strategy_lock not the individual buffer spinlocks, so it's OK to
-	 * manipulate them without holding the spinlock.
-	 */
-	if (StrategyControl->firstFreeBuffer >= 0)
-	{
-		while (true)
-		{
-			/* Acquire the spinlock to remove element from the freelist */
-			SpinLockAcquire(&StrategyControl->buffer_strategy_lock);
-
-			if (StrategyControl->firstFreeBuffer < 0)
-			{
-				SpinLockRelease(&StrategyControl->buffer_strategy_lock);
-				break;
-			}
-
-			buf = GetBufferDescriptor(StrategyControl->firstFreeBuffer);
-			Assert(buf->freeNext != FREENEXT_NOT_IN_LIST);
-
-			/* Unconditionally remove buffer from freelist */
-			StrategyControl->firstFreeBuffer = buf->freeNext;
-			buf->freeNext = FREENEXT_NOT_IN_LIST;
-
-			/*
-			 * Release the lock so someone else can access the freelist while
-			 * we check out this buffer.
-			 */
-			SpinLockRelease(&StrategyControl->buffer_strategy_lock);
-
-			/*
-			 * If the buffer is pinned or has a nonzero usage_count, we cannot
-			 * use it; discard it and retry.  (This can only happen if VACUUM
-			 * put a valid buffer in the freelist and then someone else used
-			 * it before we got to it.  It's probably impossible altogether as
-			 * of 8.3, but we'd better check anyway.)
-			 */
-			local_buf_state = LockBufHdr(buf);
-			if (BUF_STATE_GET_REFCOUNT(local_buf_state) == 0
-				&& BUF_STATE_GET_USAGECOUNT(local_buf_state) == 0)
-			{
-				if (strategy != NULL)
-					AddBufferToRing(strategy, buf);
-				*buf_state = local_buf_state;
-				return buf;
-			}
-			UnlockBufHdr(buf, local_buf_state);
-		}
-	}
-
-	/* Nothing on the freelist, so run the "clock sweep" algorithm */
+	/* Use the "clock sweep" algorithm to find a free buffer */
 	trycounter = NBuffers;
 	for (;;)
 	{
@@ -356,29 +286,6 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state, bool *from_r
 	}
 }
 
-/*
- * StrategyFreeBuffer: put a buffer on the freelist
- */
-void
-StrategyFreeBuffer(BufferDesc *buf)
-{
-	SpinLockAcquire(&StrategyControl->buffer_strategy_lock);
-
-	/*
-	 * It is possible that we are told to put something in the freelist that
-	 * is already in it; don't screw up the list if so.
-	 */
-	if (buf->freeNext == FREENEXT_NOT_IN_LIST)
-	{
-		buf->freeNext = StrategyControl->firstFreeBuffer;
-		if (buf->freeNext < 0)
-			StrategyControl->lastFreeBuffer = buf->buf_id;
-		StrategyControl->firstFreeBuffer = buf->buf_id;
-	}
-
-	SpinLockRelease(&StrategyControl->buffer_strategy_lock);
-}
-
 /*
  * StrategySyncStart -- tell BgBufferSync where to start syncing
  *
@@ -504,13 +411,6 @@ StrategyInitialize(bool init)
 
 		SpinLockInit(&StrategyControl->buffer_strategy_lock);
 
-		/*
-		 * Grab the whole linked list of free buffers for our strategy. We
-		 * assume it was previously set up by BufferManagerShmemInit().
-		 */
-		StrategyControl->firstFreeBuffer = 0;
-		StrategyControl->lastFreeBuffer = NBuffers - 1;
-
 		/* Initialize the clock sweep pointer */
 		pg_atomic_init_u32(&StrategyControl->nextVictimBuffer, 0);
 
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index 52a71b138f7..00eade63971 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -217,8 +217,7 @@ BufMappingPartitionLockByIndex(uint32 index)
  * single atomic variable.  This layout allow us to do some operations in a
  * single atomic operation, without actually acquiring and releasing spinlock;
  * for instance, increase or decrease refcount.  buf_id field never changes
- * after initialization, so does not need locking.  freeNext is protected by
- * the buffer_strategy_lock not buffer header lock.  The LWLock can take care
+ * after initialization, so does not need locking.  The LWLock can take care
  * of itself.  The buffer header lock is *not* used to control access to the
  * data in the buffer!
  *
@@ -264,7 +263,6 @@ typedef struct BufferDesc
 	pg_atomic_uint32 state;
 
 	int			wait_backend_pgprocno;	/* backend of pin-count waiter */
-	int			freeNext;		/* link in freelist chain */
 
 	PgAioWaitRef io_wref;		/* set iff AIO is in progress */
 	LWLock		content_lock;	/* to lock access to buffer contents */
@@ -360,13 +358,6 @@ BufferDescriptorGetContentLock(const BufferDesc *bdesc)
 	return (LWLock *) (&bdesc->content_lock);
 }
 
-/*
- * The freeNext field is either the index of the next freelist entry,
- * or one of these special values:
- */
-#define FREENEXT_END_OF_LIST	(-1)
-#define FREENEXT_NOT_IN_LIST	(-2)
-
 /*
  * Functions for acquiring/releasing a shared buffer header's spinlock.  Do
  * not apply these to local buffers!
@@ -453,7 +444,6 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
-extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
-- 
2.49.0

From bfde3244542b87c369d339c859578817a2089fd8 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Fri, 11 Jul 2025 09:05:45 -0400
Subject: [PATCH v10 2/3] Remove the buffer_strategy_lock and make the clock
 hand a 64 bit atomic

Change nextVictimBuffer to an atomic uint64 and simply atomically
increment it by 1 at each tick.  The next victim buffer is the the value
of nextVictimBuffer modulo the number of buffers (NBuffers).  The number
of complete passes of the clock-sweep hand is nextVictimBuffer divided
by NBuffers. Wrap-around of nextVictimBuffer would require 10 years at
~59 billion ticks per-second without restart, so unlikely that we ignore
that case entirely.

With the removal of the freelist and completePasses none of remaining
items in the BufferStrategyControl structure require strict coordination
and so it is possible to eliminate the buffer_strategy_lock as well.
---
 src/backend/storage/buffer/README     |  48 ++++-----
 src/backend/storage/buffer/bufmgr.c   |  20 +++-
 src/backend/storage/buffer/freelist.c | 139 ++++++--------------------
 src/backend/storage/buffer/localbuf.c |   2 +-
 src/include/storage/buf_internals.h   |   4 +-
 5 files changed, 71 insertions(+), 142 deletions(-)

diff --git a/src/backend/storage/buffer/README b/src/backend/storage/buffer/README
index cd52effd911..d1ab222eeb8 100644
--- a/src/backend/storage/buffer/README
+++ b/src/backend/storage/buffer/README
@@ -127,11 +127,10 @@ bits of the tag's hash value.  The rules stated above apply to each partition
 independently.  If it is necessary to lock more than one partition at a time,
 they must be locked in partition-number order to avoid risk of deadlock.
 
-* A separate system-wide spinlock, buffer_strategy_lock, provides mutual
-exclusion for operations that select buffers for replacement.  A spinlock is
-used here rather than a lightweight lock for efficiency; no other locks of any
-sort should be acquired while buffer_strategy_lock is held.  This is essential
-to allow buffer replacement to happen in multiple backends with reasonable
+* Operations that select buffers for replacement don't require a lock, but
+rather use atomic operations to ensure coordination across backends when
+accessing members of the BufferStrategyControl datastructure. This allows
+buffer replacement to happen in multiple backends with reasonable
 concurrency.
 
 * Each buffer header contains a spinlock that must be taken when examining
@@ -158,9 +157,9 @@ unset by sleeping on the buffer's condition variable.
 Normal Buffer Replacement Strategy
 ----------------------------------
 
-To choose a victim buffer to recycle when there are no free buffers available,
-we use a simple clock-sweep algorithm, which avoids the need to take
-system-wide locks during common operations.  It works like this:
+To choose a victim buffer to recycle we use a simple clock-sweep algorithm,
+which avoids the need to take system-wide locks during common operations.  It
+works like this:
 
 Each buffer header contains a usage counter, which is incremented (up to a
 small limit value) whenever the buffer is pinned.  (This requires only the
@@ -168,19 +167,17 @@ buffer header spinlock, which would have to be taken anyway to increment the
 buffer reference count, so it's nearly free.)
 
 The "clock hand" is a buffer index, nextVictimBuffer, that moves circularly
-through all the available buffers.  nextVictimBuffer is protected by the
-buffer_strategy_lock.
+through all the available buffers.  nextVictimBuffer and completePasses are
+atomic values.
 
 The algorithm for a process that needs to obtain a victim buffer is:
 
-1. Obtain buffer_strategy_lock.
+1. Select the buffer pointed to by nextVictimBuffer, and circularly advance
+nextVictimBuffer for next time.
 
-2. Select the buffer pointed to by nextVictimBuffer, and circularly advance
-nextVictimBuffer for next time. Release buffer_strategy_lock.
-
-3. If the selected buffer is pinned or has a nonzero usage count, it cannot
-be used.  Decrement its usage count (if nonzero), reacquire
-buffer_strategy_lock, and return to step 3 to examine the next buffer.
+2. If the selected buffer is pinned or has a nonzero usage count, it cannot be
+used.  Decrement its usage count (if nonzero), return to step 3 to examine the
+next buffer.
 
 4. Pin the selected buffer, and return.
 
@@ -196,9 +193,9 @@ Buffer Ring Replacement Strategy
 When running a query that needs to access a large number of pages just once,
 such as VACUUM or a large sequential scan, a different strategy is used.
 A page that has been touched only by such a scan is unlikely to be needed
-again soon, so instead of running the normal clock sweep algorithm and
+again soon, so instead of running the normal clock-sweep algorithm and
 blowing out the entire buffer cache, a small ring of buffers is allocated
-using the normal clock sweep algorithm and those buffers are reused for the
+using the normal clock-sweep algorithm and those buffers are reused for the
 whole scan.  This also implies that much of the write traffic caused by such
 a statement will be done by the backend itself and not pushed off onto other
 processes.
@@ -244,13 +241,12 @@ nextVictimBuffer (which it does not change!), looking for buffers that are
 dirty and not pinned nor marked with a positive usage count.  It pins,
 writes, and releases any such buffer.
 
-If we can assume that reading nextVictimBuffer is an atomic action, then
-the writer doesn't even need to take buffer_strategy_lock in order to look
-for buffers to write; it needs only to spinlock each buffer header for long
-enough to check the dirtybit.  Even without that assumption, the writer
-only needs to take the lock long enough to read the variable value, not
-while scanning the buffers.  (This is a very substantial improvement in
-the contention cost of the writer compared to PG 8.0.)
+We enforce reading nextVictimBuffer within an atomic action so it needs only to
+spinlock each buffer header for long enough to check the dirtybit.  Even
+without that assumption, the writer only needs to take the lock long enough to
+read the variable value, not while scanning the buffers. (This is a very
+substantial improvement in the contention cost of the writer compared to PG
+8.0.)
 
 The background writer takes shared content lock on a buffer while writing it
 out (and anyone else who flushes buffer contents to disk must do so too).
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index af5ef025229..09d054a616f 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -3593,7 +3593,7 @@ BufferSync(int flags)
  * This is called periodically by the background writer process.
  *
  * Returns true if it's appropriate for the bgwriter process to go into
- * low-power hibernation mode.  (This happens if the strategy clock sweep
+ * low-power hibernation mode.  (This happens if the strategy clock-sweep
  * has been "lapped" and no buffer allocations have occurred recently,
  * or if the bgwriter has been effectively disabled by setting
  * bgwriter_lru_maxpages to 0.)
@@ -3643,7 +3643,7 @@ BgBufferSync(WritebackContext *wb_context)
 	uint32		new_recent_alloc;
 
 	/*
-	 * Find out where the clock sweep currently is, and how many buffer
+	 * Find out where the clock-sweep currently is, and how many buffer
 	 * allocations have happened since our last call.
 	 */
 	strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc);
@@ -3664,15 +3664,25 @@ BgBufferSync(WritebackContext *wb_context)
 
 	/*
 	 * Compute strategy_delta = how many buffers have been scanned by the
-	 * clock sweep since last time.  If first time through, assume none. Then
-	 * see if we are still ahead of the clock sweep, and if so, how many
+	 * clock-sweep since last time.  If first time through, assume none. Then
+	 * see if we are still ahead of the clock-sweep, and if so, how many
 	 * buffers we could scan before we'd catch up with it and "lap" it. Note:
 	 * weird-looking coding of xxx_passes comparisons are to avoid bogus
 	 * behavior when the passes counts wrap around.
 	 */
 	if (saved_info_valid)
 	{
-		int32		passes_delta = strategy_passes - prev_strategy_passes;
+		int32		passes_delta;
+
+		/*
+		 * It is highly unlikely that the uint64 hand of the clock-sweep
+		 * would ever wrap, that would roughtly require 10 years of
+		 * continuous operation at ~59 billion ticks per-second without
+		 * restart.
+		 */
+		Assert(prev_strategy_passes <= strategy_passes);
+
+		passes_delta = strategy_passes - prev_strategy_passes;
 
 		strategy_delta = strategy_buf_id - prev_strategy_buf_id;
 		strategy_delta += (long) passes_delta * NBuffers;
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 162c140fb9d..906b35be4c1 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -27,23 +27,20 @@
 /*
  * The shared freelist control information.
  */
-typedef struct
-{
-	/* Spinlock: protects the values below */
-	slock_t		buffer_strategy_lock;
-
+typedef struct {
 	/*
-	 * Clock sweep hand: index of next buffer to consider grabbing. Note that
-	 * this isn't a concrete buffer - we only ever increase the value. So, to
-	 * get an actual buffer, it needs to be used modulo NBuffers.
+	 * The clock-sweep hand is atomically updated by 1 at every tick.  Use the
+	 * macro CLOCK_HAND_POSITION() o find the next victim's index in the
+	 * BufferDescriptor array. To calculate the number of times the clock-sweep
+	 * hand has made a complete pass through all available buffers in the pool
+	 * divide NBuffers.
 	 */
-	pg_atomic_uint32 nextVictimBuffer;
+	pg_atomic_uint64 nextVictimBuffer;
 
 	/*
 	 * Statistics.  These counters should be wide enough that they can't
 	 * overflow during a single bgwriter cycle.
 	 */
-	uint32		completePasses; /* Complete cycles of the clock sweep */
 	pg_atomic_uint32 numBufferAllocs;	/* Buffers allocated since last reset */
 
 	/*
@@ -83,13 +80,15 @@ typedef struct BufferAccessStrategyData
 	Buffer		buffers[FLEXIBLE_ARRAY_MEMBER];
 }			BufferAccessStrategyData;
 
-
 /* Prototypes for internal functions */
 static BufferDesc *GetBufferFromRing(BufferAccessStrategy strategy,
 									 uint32 *buf_state);
 static void AddBufferToRing(BufferAccessStrategy strategy,
 							BufferDesc *buf);
 
+#define CLOCK_HAND_POSITION(counter) \
+	((counter) & 0xFFFFFFFF) % NBuffers
+
 /*
  * ClockSweepTick - Helper routine for StrategyGetBuffer()
  *
@@ -99,6 +98,7 @@ static void AddBufferToRing(BufferAccessStrategy strategy,
 static inline uint32
 ClockSweepTick(void)
 {
+	uint64		hand = UINT64_MAX;
 	uint32		victim;
 
 	/*
@@ -106,71 +106,14 @@ ClockSweepTick(void)
 	 * doing this, this can lead to buffers being returned slightly out of
 	 * apparent order.
 	 */
-	victim =
-		pg_atomic_fetch_add_u32(&StrategyControl->nextVictimBuffer, 1);
-
-	if (victim >= NBuffers)
-	{
-		uint32		originalVictim = victim;
-
-		/* always wrap what we look up in BufferDescriptors */
-		victim = victim % NBuffers;
+	hand = pg_atomic_fetch_add_u64(&StrategyControl->nextVictimBuffer, 1);
 
-		/*
-		 * If we're the one that just caused a wraparound, force
-		 * completePasses to be incremented while holding the spinlock. We
-		 * need the spinlock so StrategySyncStart() can return a consistent
-		 * value consisting of nextVictimBuffer and completePasses.
-		 */
-		if (victim == 0)
-		{
-			uint32		expected;
-			uint32		wrapped;
-			bool		success = false;
+	victim = CLOCK_HAND_POSITION(hand);
+	Assert(victim < NBuffers);
 
-			expected = originalVictim + 1;
-
-			while (!success)
-			{
-				/*
-				 * Acquire the spinlock while increasing completePasses. That
-				 * allows other readers to read nextVictimBuffer and
-				 * completePasses in a consistent manner which is required for
-				 * StrategySyncStart().  In theory delaying the increment
-				 * could lead to an overflow of nextVictimBuffers, but that's
-				 * highly unlikely and wouldn't be particularly harmful.
-				 */
-				SpinLockAcquire(&StrategyControl->buffer_strategy_lock);
-
-				wrapped = expected % NBuffers;
-
-				success = pg_atomic_compare_exchange_u32(&StrategyControl->nextVictimBuffer,
-														 &expected, wrapped);
-				if (success)
-					StrategyControl->completePasses++;
-				SpinLockRelease(&StrategyControl->buffer_strategy_lock);
-			}
-		}
-	}
 	return victim;
 }
 
-/*
- * have_free_buffer -- check if we've filled the buffer pool at startup
- *
- * Used exclusively by autoprewarm.
- */
-bool
-have_free_buffer(void)
-{
-	uint64		hand = pg_atomic_read_u64(&StrategyControl->nextVictimBuffer);
-
-	if (hand < NBuffers)
-		return true;
-	else
-		return false;
-}
-
 /*
  * StrategyGetBuffer
  *
@@ -193,10 +136,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state, bool *from_r
 
 	*from_ring = false;
 
-	/*
-	 * If given a strategy object, see whether it can select a buffer. We
-	 * assume strategy objects don't need buffer_strategy_lock.
-	 */
+	/* If given a strategy object, see whether it can select a buffer */
 	if (strategy != NULL)
 	{
 		buf = GetBufferFromRing(strategy, buf_state);
@@ -241,7 +181,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state, bool *from_r
 	 */
 	pg_atomic_fetch_add_u32(&StrategyControl->numBufferAllocs, 1);
 
-	/* Use the "clock sweep" algorithm to find a free buffer */
+	/* Use the "clock-sweep" algorithm to find a free buffer */
 	trycounter = NBuffers;
 	for (;;)
 	{
@@ -292,37 +232,30 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state, bool *from_r
  * The result is the buffer index of the best buffer to sync first.
  * BgBufferSync() will proceed circularly around the buffer array from there.
  *
- * In addition, we return the completed-pass count (which is effectively
- * the higher-order bits of nextVictimBuffer) and the count of recent buffer
- * allocs if non-NULL pointers are passed.  The alloc count is reset after
- * being read.
+ * In addition, we return the completed-pass count and the count of recent
+ * buffer allocs if non-NULL pointers are passed.  The alloc count is reset
+ * after being read.
  */
-int
-StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc)
-{
-	uint32		nextVictimBuffer;
-	int			result;
+uint32 StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc) {
+	uint64		counter = UINT64_MAX; uint32		result;
 
-	SpinLockAcquire(&StrategyControl->buffer_strategy_lock);
-	nextVictimBuffer = pg_atomic_read_u32(&StrategyControl->nextVictimBuffer);
-	result = nextVictimBuffer % NBuffers;
+	counter = pg_atomic_read_u64(&StrategyControl->nextVictimBuffer);
+	result = CLOCK_HAND_POSITION(counter);
 
 	if (complete_passes)
 	{
-		*complete_passes = StrategyControl->completePasses;
-
 		/*
-		 * Additionally add the number of wraparounds that happened before
-		 * completePasses could be incremented. C.f. ClockSweepTick().
+		 * The number of complete passes is the counter divided by NBuffers
+		 * because the clock hand is a 64-bit counter that only increases.
 		 */
-		*complete_passes += nextVictimBuffer / NBuffers;
+		*complete_passes = (uint32) (counter / NBuffers);
 	}
 
 	if (num_buf_alloc)
 	{
 		*num_buf_alloc = pg_atomic_exchange_u32(&StrategyControl->numBufferAllocs, 0);
 	}
-	SpinLockRelease(&StrategyControl->buffer_strategy_lock);
+
 	return result;
 }
 
@@ -337,21 +270,14 @@ StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc)
 void
 StrategyNotifyBgWriter(int bgwprocno)
 {
-	/*
-	 * We acquire buffer_strategy_lock just to ensure that the store appears
-	 * atomic to StrategyGetBuffer.  The bgwriter should call this rather
-	 * infrequently, so there's no performance penalty from being safe.
-	 */
-	SpinLockAcquire(&StrategyControl->buffer_strategy_lock);
 	StrategyControl->bgwprocno = bgwprocno;
-	SpinLockRelease(&StrategyControl->buffer_strategy_lock);
 }
 
 
 /*
  * StrategyShmemSize
  *
- * estimate the size of shared memory used by the freelist-related structures.
+ * Estimate the size of shared memory used by the freelist-related structures.
  *
  * Note: for somewhat historical reasons, the buffer lookup hashtable size
  * is also determined here.
@@ -409,13 +335,10 @@ StrategyInitialize(bool init)
 		 */
 		Assert(init);
 
-		SpinLockInit(&StrategyControl->buffer_strategy_lock);
-
-		/* Initialize the clock sweep pointer */
-		pg_atomic_init_u32(&StrategyControl->nextVictimBuffer, 0);
+		/* Initialize combined clock-sweep pointer/complete passes counter */
+		pg_atomic_init_u64(&StrategyControl->nextVictimBuffer, 0);
 
 		/* Clear statistics */
-		StrategyControl->completePasses = 0;
 		pg_atomic_init_u32(&StrategyControl->numBufferAllocs, 0);
 
 		/* No pending notification */
@@ -659,7 +582,7 @@ GetBufferFromRing(BufferAccessStrategy strategy, uint32 *buf_state)
 	 *
 	 * If usage_count is 0 or 1 then the buffer is fair game (we expect 1,
 	 * since our own previous usage of the ring element would have left it
-	 * there, but it might've been decremented by clock sweep since then). A
+	 * there, but it might've been decremented by clock-sweep since then). A
 	 * higher usage_count indicates someone else has touched the buffer, so we
 	 * shouldn't re-use it.
 	 */
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index 3da9c41ee1d..7a34f5e430a 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -229,7 +229,7 @@ GetLocalVictimBuffer(void)
 	ResourceOwnerEnlarge(CurrentResourceOwner);
 
 	/*
-	 * Need to get a new buffer.  We use a clock sweep algorithm (essentially
+	 * Need to get a new buffer.  We use a clock-sweep algorithm (essentially
 	 * the same as what freelist.c does now...)
 	 */
 	trycounter = NLocBuffer;
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index 00eade63971..97002acb757 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -81,7 +81,7 @@ StaticAssertDecl(BUF_REFCOUNT_BITS + BUF_USAGECOUNT_BITS + BUF_FLAG_BITS == 32,
  * accuracy and speed of the clock-sweep buffer management algorithm.  A
  * large value (comparable to NBuffers) would approximate LRU semantics.
  * But it can take as many as BM_MAX_USAGE_COUNT+1 complete cycles of
- * clock sweeps to find a free buffer, so in practice we don't want the
+ * clock-sweeps to find a free buffer, so in practice we don't want the
  * value to be very large.
  */
 #define BM_MAX_USAGE_COUNT	5
@@ -439,7 +439,7 @@ extern void StrategyFreeBuffer(BufferDesc *buf);
 extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
 								 BufferDesc *buf, bool from_ring);
 
-extern int	StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc);
+extern uint32 StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc);
 extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
-- 
2.49.0

From 8195dd714cfa14ffd68a04a36ac5496eba421897 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Fri, 25 Jul 2025 11:53:10 -0400
Subject: [PATCH v10 3/3] Abstract clock-sweep buffer replacement algorithm

Re-author the clock-sweep algorithm such that it maintains its own state
and has a well defined API.
---
 src/backend/storage/buffer/README     | 20 +++---
 src/backend/storage/buffer/freelist.c | 88 ++++++++++++++-------------
 src/tools/pgindent/typedefs.list      |  1 +
 3 files changed, 57 insertions(+), 52 deletions(-)

diff --git a/src/backend/storage/buffer/README b/src/backend/storage/buffer/README
index d1ab222eeb8..3f31d04d572 100644
--- a/src/backend/storage/buffer/README
+++ b/src/backend/storage/buffer/README
@@ -166,14 +166,13 @@ small limit value) whenever the buffer is pinned.  (This requires only the
 buffer header spinlock, which would have to be taken anyway to increment the
 buffer reference count, so it's nearly free.)
 
-The "clock hand" is a buffer index, nextVictimBuffer, that moves circularly
-through all the available buffers.  nextVictimBuffer and completePasses are
-atomic values.
+The "clock hand" is a buffer index that moves circularly through all the
+available buffers.
 
 The algorithm for a process that needs to obtain a victim buffer is:
 
-1. Select the buffer pointed to by nextVictimBuffer, and circularly advance
-nextVictimBuffer for next time.
+1. Select the buffer pointed to by the clock hand, and circularly advance it
+for next time.
 
 2. If the selected buffer is pinned or has a nonzero usage count, it cannot be
 used.  Decrement its usage count (if nonzero), return to step 3 to examine the
@@ -235,13 +234,12 @@ Background Writer's Processing
 ------------------------------
 
 The background writer is designed to write out pages that are likely to be
-recycled soon, thereby offloading the writing work from active backends.
-To do this, it scans forward circularly from the current position of
-nextVictimBuffer (which it does not change!), looking for buffers that are
-dirty and not pinned nor marked with a positive usage count.  It pins,
-writes, and releases any such buffer.
+recycled soon, thereby offloading the writing work from active backends. To do
+this, it scans forward circularly from the current position of clock (which it
+does not change!), looking for buffers that are dirty and not pinned nor marked
+with a positive usage count.  It pins, writes, and releases any such buffer.
 
-We enforce reading nextVictimBuffer within an atomic action so it needs only to
+We enforce reading the clock hand within an atomic action so it needs only to
 spinlock each buffer header for long enough to check the dirtybit.  Even
 without that assumption, the writer only needs to take the lock long enough to
 read the variable value, not while scanning the buffers. (This is a very
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 906b35be4c1..71839dfdee9 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -23,19 +23,22 @@
 
 #define INT_ACCESS_ONCE(var)	((int)(*((volatile int *)&(var))))
 
+typedef struct ClockSweep
+{
+	pg_atomic_uint64 counter;	/* Only incremented by one */
+	uint32_t	size;			/* Size of the clock */
+} ClockSweep;
 
 /*
  * The shared freelist control information.
  */
-typedef struct {
+typedef struct
+{
 	/*
-	 * The clock-sweep hand is atomically updated by 1 at every tick.  Use the
-	 * macro CLOCK_HAND_POSITION() o find the next victim's index in the
-	 * BufferDescriptor array. To calculate the number of times the clock-sweep
-	 * hand has made a complete pass through all available buffers in the pool
-	 * divide NBuffers.
+	 * The next buffer available for use is determined by the clock-sweep
+	 * algorithm.
 	 */
-	pg_atomic_uint64 nextVictimBuffer;
+	ClockSweep	clock;
 
 	/*
 	 * Statistics.  These counters should be wide enough that they can't
@@ -86,32 +89,40 @@ static BufferDesc *GetBufferFromRing(BufferAccessStrategy strategy,
 static void AddBufferToRing(BufferAccessStrategy strategy,
 							BufferDesc *buf);
 
-#define CLOCK_HAND_POSITION(counter) \
-	((counter) & 0xFFFFFFFF) % NBuffers
+static void
+ClockSweepInit(ClockSweep *sweep, uint32 size)
+{
+	pg_atomic_init_u64(&sweep->counter, 0);
+	sweep->size = size;
+}
 
-/*
- * ClockSweepTick - Helper routine for StrategyGetBuffer()
- *
- * Move the clock hand one buffer ahead of its current position and return the
- * id of the buffer now under the hand.
- */
+/* Extract the number of complete cycles from the clock hand */
 static inline uint32
-ClockSweepTick(void)
+ClockSweepCycles(ClockSweep *sweep)
 {
-	uint64		hand = UINT64_MAX;
-	uint32		victim;
+	uint64		current = pg_atomic_read_u64(&sweep->counter);
 
-	/*
-	 * Atomically move hand ahead one buffer - if there's several processes
-	 * doing this, this can lead to buffers being returned slightly out of
-	 * apparent order.
-	 */
-	hand = pg_atomic_fetch_add_u64(&StrategyControl->nextVictimBuffer, 1);
+	return current / sweep->size;
+}
+
+/* Return the current position of the clock's hand modulo size */
+static inline uint32
+ClockSweepPosition(ClockSweep *sweep)
+{
+	uint64		counter = pg_atomic_read_u64(&sweep->counter);
+
+	return ((counter) & 0xFFFFFFFF) % sweep->size;
+}
 
-	victim = CLOCK_HAND_POSITION(hand);
-	Assert(victim < NBuffers);
+/*
+ * Move the clock hand ahead one and return its new position.
+ */
+static inline uint32
+ClockSweepTick(ClockSweep *sweep)
+{
+	uint64		counter = pg_atomic_fetch_add_u64(&sweep->counter, 1);
 
-	return victim;
+	return ((counter) & 0xFFFFFFFF) % sweep->size;
 }
 
 /*
@@ -181,11 +192,11 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state, bool *from_r
 	 */
 	pg_atomic_fetch_add_u32(&StrategyControl->numBufferAllocs, 1);
 
-	/* Use the "clock-sweep" algorithm to find a free buffer */
+	/* Use the clock-sweep algorithm to find a free buffer */
 	trycounter = NBuffers;
 	for (;;)
 	{
-		buf = GetBufferDescriptor(ClockSweepTick());
+		buf = GetBufferDescriptor(ClockSweepTick(&StrategyControl->clock));
 
 		/*
 		 * If the buffer is pinned or has a nonzero usage_count, we cannot use
@@ -236,19 +247,14 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state, bool *from_r
  * buffer allocs if non-NULL pointers are passed.  The alloc count is reset
  * after being read.
  */
-uint32 StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc) {
-	uint64		counter = UINT64_MAX; uint32		result;
-
-	counter = pg_atomic_read_u64(&StrategyControl->nextVictimBuffer);
-	result = CLOCK_HAND_POSITION(counter);
+uint32
+StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc)
+{
+	uint32		result = ClockSweepPosition(&StrategyControl->clock);
 
 	if (complete_passes)
 	{
-		/*
-		 * The number of complete passes is the counter divided by NBuffers
-		 * because the clock hand is a 64-bit counter that only increases.
-		 */
-		*complete_passes = (uint32) (counter / NBuffers);
+		*complete_passes = ClockSweepCycles(&StrategyControl->clock);
 	}
 
 	if (num_buf_alloc)
@@ -335,8 +341,8 @@ StrategyInitialize(bool init)
 		 */
 		Assert(init);
 
-		/* Initialize combined clock-sweep pointer/complete passes counter */
-		pg_atomic_init_u64(&StrategyControl->nextVictimBuffer, 0);
+		/* Initialize the clock-sweep algorithm */
+		ClockSweepInit(&StrategyControl->clock, NBuffers);
 
 		/* Clear statistics */
 		pg_atomic_init_u32(&StrategyControl->numBufferAllocs, 0);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4353befab99..1cbfca592d9 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -426,6 +426,7 @@ ClientCertName
 ClientConnectionInfo
 ClientData
 ClientSocket
+ClockSweep
 ClonePtrType
 ClosePortalStmt
 ClosePtrType
-- 
2.49.0

Re: [PATCH] Let's get rid of the freelist and the buffer_strategy_lock

Reply via email to