Re: BM_IO_ERROR flag is lost in TerminateBufferIO due to order of operations in UnlockBufHdrExt
Hi, On 2026-06-02 12:15:40 -0400, Melanie Plageman wrote: > On Mon, Apr 6, 2026 at 4:21 AM Michael Paquier wrote: > > > > On Mon, Mar 30, 2026 at 11:25:55AM +0300, Yura Sokolov wrote: > > > Have you any recommendations for patch? > > > > > > Should I create CF item for patch? > > > > The business with unset_flag_bits is new as of HEAD, so this could > > qualify as an open item, to be tracked here: > > https://wiki.postgresql.org/wiki/PostgreSQL_19_Open_Items > > There hasn't been recent activity on this thread and it is one of the > oldest open items. Even if we don't have time to review the test right > now, it is probably worth pushing the fix. Though we have just missed > beta1... Ugh. I had lost track of this. I'll get back to it as soon as I have squared the CI stuff away. Greetings, Andres Freund
Re: BM_IO_ERROR flag is lost in TerminateBufferIO due to order of operations in UnlockBufHdrExt
On Mon, Apr 6, 2026 at 4:21 AM Michael Paquier wrote: > > On Mon, Mar 30, 2026 at 11:25:55AM +0300, Yura Sokolov wrote: > > Have you any recommendations for patch? > > > > Should I create CF item for patch? > > The business with unset_flag_bits is new as of HEAD, so this could > qualify as an open item, to be tracked here: > https://wiki.postgresql.org/wiki/PostgreSQL_19_Open_Items There hasn't been recent activity on this thread and it is one of the oldest open items. Even if we don't have time to review the test right now, it is probably worth pushing the fix. Though we have just missed beta1... - Melanie
Re: BM_IO_ERROR flag is lost in TerminateBufferIO due to order of operations in UnlockBufHdrExt
On Mon, Mar 30, 2026 at 11:25:55AM +0300, Yura Sokolov wrote: > Have you any recommendations for patch? > > Should I create CF item for patch? The business with unset_flag_bits is new as of HEAD, so this could qualify as an open item, to be tracked here: https://wiki.postgresql.org/wiki/PostgreSQL_19_Open_Items -- Michael signature.asc Description: PGP signature
Re: BM_IO_ERROR flag is lost in TerminateBufferIO due to order of operations in UnlockBufHdrExt
27.03.2026 21:30, Yura Sokolov пишет:
> Good day,
>
> 27.03.2026 12:04, Yura Sokolov wrote:
>> 26.03.2026 23:29, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2026-03-25 17:51:30 +0300, Yura Sokolov wrote:
UnlockBufHdrExt does:
buf_state |= set_bits;
buf_state &= ~unset_bits;
buf_state &= ~BM_LOCKED;
TerminateBufferIO unconditionally does:
unset_flag_bits |= BM_IO_ERROR;
Due to this, AbortBufferIO and buffer_readv_complete_one are failed
to set BM_IO_ERROR with call to TerminateBufferIO.
It was found with proprietary code that was triggered on
PGAIO_RS_ERROR and made assertion on BM_IO_ERROR presence.
>>>
>>> That's clearly not right. Care to write a patch? I think we should add a
>>> test for this in src/test/modules/test_aio too. As we don't rely on things
>>> like BM_IO_ERROR this is quite easy to not notice.
>> I thought, fix is too small to go the way of patches.
>> I don't mind if you just push it.
>>
>> But if I can save your time on writing test, I'll try.
>>
>> ...
>>
>> I believe, proper way is to add assertion in UnlockBufHdrExt:
>>
>> Assert(!(set_bits & unset_bits));
>>
>> And make TerminateBufferIO to exclude BM_IO_ERROR if it is among
>> set_flag_bits.
>>
>> Is it ok?
>
> Here patchset is.
>
> I've tried to modify 001_aio.pl to test presence BM_IO_ERROR after read
> failure.
>
> I've also added FlushBuffer failure test in second patch (v1-002) using
> injection point. I don't know if 001-aio.pl is a good place for since write
> is not async yet.
>
> v1-003 just mades DebugPrintBufferRefcount prettier.
rebased.
--
regards
Yura Sokolov aka funny-falconFrom 5cbaa035027d4f6c57d561ca4f9c5161f7a49471 Mon Sep 17 00:00:00 2001
From: Yura Sokolov
Date: Fri, 27 Mar 2026 20:11:06 +0300
Subject: [PATCH v2 1/3] bufmgr: Fix possibility to set BM_IO_ERROR
Previously it couldn't be set because TerminateBufferIO added BM_IO_ERROR
to unset_flag_bits unconditionally, and UnlockBufHdrExt applied unset_bits
after set_bits.
Fix by not setting BM_IO_ERROR into unset_flag_bits if it is present in
set_flag_bits.
Also protect from possible similar errors by adding assertion to
UnlockBufHdrExt unset_bits and set_bits have no bits in common.
Modify src/test/modules/test_aio/t/001_aio.pl test_io_error to check
presence of BM_IO_ERROR.
---
src/backend/storage/buffer/bufmgr.c | 4 +--
src/include/storage/buf_internals.h | 1 +
src/test/modules/test_aio/t/001_aio.pl | 20 ++
src/test/modules/test_aio/test_aio--1.0.sql | 4 +++
src/test/modules/test_aio/test_aio.c| 30 +
5 files changed, 57 insertions(+), 2 deletions(-)
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index cd21ae3fc36..a81949aca7c 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -7347,8 +7347,8 @@ TerminateBufferIO(BufferDesc *buf, bool clear_dirty, uint64 set_flag_bits,
Assert(buf_state & BM_IO_IN_PROGRESS);
unset_flag_bits |= BM_IO_IN_PROGRESS;
- /* Clear earlier errors, if this IO failed, it'll be marked again */
- unset_flag_bits |= BM_IO_ERROR;
+ /* Clear earlier errors, unless this IO failed as well */
+ unset_flag_bits |= BM_IO_ERROR & ~set_flag_bits;
if (clear_dirty)
unset_flag_bits |= BM_DIRTY | BM_CHECKPOINT_NEEDED;
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index ad1b7b2216a..93887cea46d 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -475,6 +475,7 @@ UnlockBufHdrExt(BufferDesc *desc, uint64 old_buf_state,
uint64 set_bits, uint64 unset_bits,
int refcount_change)
{
+ Assert(!(set_bits & unset_bits));
for (;;)
{
uint64 buf_state = old_buf_state;
diff --git a/src/test/modules/test_aio/t/001_aio.pl b/src/test/modules/test_aio/t/001_aio.pl
index 63cadd64c15..a48b0ddfba8 100644
--- a/src/test/modules/test_aio/t/001_aio.pl
+++ b/src/test/modules/test_aio/t/001_aio.pl
@@ -344,6 +344,8 @@ SELECT modify_rel_block('tmp_corr', 1, corrupt_header=>true);
$tblname eq 'tbl_corr'
? qr/invalid page in block 1 of relation "base\/\d+\/\d+/
: qr/invalid page in block 1 of relation "base\/\d+\/t\d+_\d+/;
+ # BM_IO_ERROR and BM_TAG_VALID should be set
+ my $debug_print_re = qr/blockNum=1, flags=0x.?a00, refcount=0/;
# verify the error is reported in custom C code
psql_like(
@@ -354,6 +356,12 @@ SELECT modify_rel_block('tmp_corr', 1, corrupt_header=>true);
qr/^$/,
$invalid_page_re);
+ psql_like(
+ $io_method, $psql,
+ "validate flags of $tblname page after read_rel_block_ll()",
+ qq(SELECT debug_print_rel_block('$tblname', 1)),
+ $debug_print_re, qr/^$/);
+
# verify the error is reported for bufmgr reads, seq scan
psql_like(
$io_method, $psql,
@@ -361,6 +369,12 @@ SELECT modify_rel_block('tmp_corr', 1, corrupt_header=>true);
qq(SELECT count(*)
Re: BM_IO_ERROR flag is lost in TerminateBufferIO due to order of operations in UnlockBufHdrExt
27.03.2026 21:30, Yura Sokolov wrote: >> 26.03.2026 23:29, Andres Freund wrote: >>> >>> That's clearly not right. Care to write a patch? > > Here patchset is. Good day, Andres. Have you any recommendations for patch? Should I create CF item for patch? -- regards Yura Sokolov aka funny-falcon
Re: BM_IO_ERROR flag is lost in TerminateBufferIO due to order of operations in UnlockBufHdrExt
Good day,
27.03.2026 12:04, Yura Sokolov wrote:
> 26.03.2026 23:29, Andres Freund wrote:
>> Hi,
>>
>> On 2026-03-25 17:51:30 +0300, Yura Sokolov wrote:
>>> UnlockBufHdrExt does:
>>>
>>>buf_state |= set_bits;
>>>buf_state &= ~unset_bits;
>>>buf_state &= ~BM_LOCKED;
>>>
>>> TerminateBufferIO unconditionally does:
>>>
>>>unset_flag_bits |= BM_IO_ERROR;
>>>
>>> Due to this, AbortBufferIO and buffer_readv_complete_one are failed
>>> to set BM_IO_ERROR with call to TerminateBufferIO.
>>>
>>> It was found with proprietary code that was triggered on
>>> PGAIO_RS_ERROR and made assertion on BM_IO_ERROR presence.
>>
>> That's clearly not right. Care to write a patch? I think we should add a
>> test for this in src/test/modules/test_aio too. As we don't rely on things
>> like BM_IO_ERROR this is quite easy to not notice.
> I thought, fix is too small to go the way of patches.
> I don't mind if you just push it.
>
> But if I can save your time on writing test, I'll try.
>
> ...
>
> I believe, proper way is to add assertion in UnlockBufHdrExt:
>
> Assert(!(set_bits & unset_bits));
>
> And make TerminateBufferIO to exclude BM_IO_ERROR if it is among
> set_flag_bits.
>
> Is it ok?
Here patchset is.
I've tried to modify 001_aio.pl to test presence BM_IO_ERROR after read
failure.
I've also added FlushBuffer failure test in second patch (v1-002) using
injection point. I don't know if 001-aio.pl is a good place for since write
is not async yet.
v1-003 just mades DebugPrintBufferRefcount prettier.
--
regards
Yura Sokolov aka funny-falconFrom ef62af742c30f7c3e0c0d0dad0504d18a13596f9 Mon Sep 17 00:00:00 2001
From: Yura Sokolov
Date: Fri, 27 Mar 2026 20:11:06 +0300
Subject: [PATCH v1 1/3] bufmgr: Fix possibility to set BM_IO_ERROR
Previously it couldn't be set because TerminateBufferIO added BM_IO_ERROR
to unset_flag_bits unconditionally, and UnlockBufHdrExt applied unset_bits
after set_bits.
Fix by not setting BM_IO_ERROR into unset_flag_bits if it is present in
set_flag_bits.
Also protect from possible similar errors by adding assertion to
UnlockBufHdrExt unset_bits and set_bits have no bits in common.
Modify src/test/modules/test_aio/t/001_aio.pl test_io_error to check
presence of BM_IO_ERROR.
---
src/backend/storage/buffer/bufmgr.c | 4 +--
src/include/storage/buf_internals.h | 1 +
src/test/modules/test_aio/t/001_aio.pl | 20 ++
src/test/modules/test_aio/test_aio--1.0.sql | 4 +++
src/test/modules/test_aio/test_aio.c| 30 +
5 files changed, 57 insertions(+), 2 deletions(-)
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index e212f6110f2..16b043cd681 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -7172,8 +7172,8 @@ TerminateBufferIO(BufferDesc *buf, bool clear_dirty, uint64 set_flag_bits,
Assert(buf_state & BM_IO_IN_PROGRESS);
unset_flag_bits |= BM_IO_IN_PROGRESS;
- /* Clear earlier errors, if this IO failed, it'll be marked again */
- unset_flag_bits |= BM_IO_ERROR;
+ /* Clear earlier errors, unless this IO failed as well */
+ unset_flag_bits |= BM_IO_ERROR & ~set_flag_bits;
if (clear_dirty)
unset_flag_bits |= BM_DIRTY | BM_CHECKPOINT_NEEDED;
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index 8d1e16b5d51..14a36cfe624 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -475,6 +475,7 @@ UnlockBufHdrExt(BufferDesc *desc, uint64 old_buf_state,
uint64 set_bits, uint64 unset_bits,
int refcount_change)
{
+ Assert(!(set_bits & unset_bits));
for (;;)
{
uint64 buf_state = old_buf_state;
diff --git a/src/test/modules/test_aio/t/001_aio.pl b/src/test/modules/test_aio/t/001_aio.pl
index e18b2a2b8ae..c352dab5a41 100644
--- a/src/test/modules/test_aio/t/001_aio.pl
+++ b/src/test/modules/test_aio/t/001_aio.pl
@@ -328,6 +328,8 @@ SELECT modify_rel_block('tmp_corr', 1, corrupt_header=>true);
$tblname eq 'tbl_corr'
? qr/invalid page in block 1 of relation "base\/\d+\/\d+/
: qr/invalid page in block 1 of relation "base\/\d+\/t\d+_\d+/;
+ # BM_IO_ERROR and BM_TAG_VALID should be set
+ my $debug_print_re = qr/blockNum=1, flags=0x.?a00, refcount=0/;
# verify the error is reported in custom C code
psql_like(
@@ -338,6 +340,12 @@ SELECT modify_rel_block('tmp_corr', 1, corrupt_header=>true);
qr/^$/,
$invalid_page_re);
+ psql_like(
+ $io_method, $psql,
+ "validate flags of $tblname page after read_rel_block_ll()",
+ qq(SELECT debug_print_rel_block('$tblname', 1)),
+ $debug_print_re, qr/^$/);
+
# verify the error is reported for bufmgr reads, seq scan
psql_like(
$io_method, $psql,
@@ -345,6 +353,12 @@ SELECT modify_rel_block('tmp_corr', 1, corrupt_header=>true);
qq(SELECT count(*) FROM $tblname),
qr/^$/, $invalid_page_re);
+ psql_like(
+ $io_method, $psql,
+ "validate flags of
Re: BM_IO_ERROR flag is lost in TerminateBufferIO due to order of operations in UnlockBufHdrExt
26.03.2026 23:29, Andres Freund wrote: > Hi, > > On 2026-03-25 17:51:30 +0300, Yura Sokolov wrote: >> UnlockBufHdrExt does: >> >>buf_state |= set_bits; >>buf_state &= ~unset_bits; >>buf_state &= ~BM_LOCKED; >> >> TerminateBufferIO unconditionally does: >> >>unset_flag_bits |= BM_IO_ERROR; >> >> Due to this, AbortBufferIO and buffer_readv_complete_one are failed >> to set BM_IO_ERROR with call to TerminateBufferIO. >> >> It was found with proprietary code that was triggered on >> PGAIO_RS_ERROR and made assertion on BM_IO_ERROR presence. > > That's clearly not right. Care to write a patch? I think we should add a > test for this in src/test/modules/test_aio too. As we don't rely on things > like BM_IO_ERROR this is quite easy to not notice. I thought, fix is too small to go the way of patches. I don't mind if you just push it. But if I can save your time on writing test, I'll try. ... I believe, proper way is to add assertion in UnlockBufHdrExt: Assert(!(set_bits & unset_bits)); And make TerminateBufferIO to exclude BM_IO_ERROR if it is among set_flag_bits. Is it ok? -- regards Yura Sokolov aka funny-falcon
Re: BM_IO_ERROR flag is lost in TerminateBufferIO due to order of operations in UnlockBufHdrExt
Hi, On 2026-03-25 17:51:30 +0300, Yura Sokolov wrote: > UnlockBufHdrExt does: > >buf_state |= set_bits; >buf_state &= ~unset_bits; >buf_state &= ~BM_LOCKED; > > TerminateBufferIO unconditionally does: > >unset_flag_bits |= BM_IO_ERROR; > > Due to this, AbortBufferIO and buffer_readv_complete_one are failed > to set BM_IO_ERROR with call to TerminateBufferIO. > > It was found with proprietary code that was triggered on > PGAIO_RS_ERROR and made assertion on BM_IO_ERROR presence. That's clearly not right. Care to write a patch? I think we should add a test for this in src/test/modules/test_aio too. As we don't rely on things like BM_IO_ERROR this is quite easy to not notice. Greetings, Andres Freund
