Re: IO in wrong state on riscv64

2025-11-08 Thread Thomas Munro
On Sun, Nov 9, 2025 at 8:26 AM Tom Lane wrote: > ... BTW, I wonder why you did not add pg_compiler_barrier_impl() > to our other use of __atomic_thread_fence: > > #if !defined(pg_memory_barrier_impl) > #if defined(HAVE_GCC__ATOMIC_INT32_CAS) > #define pg_memory_barrier_impl() >

Re: IO in wrong state on riscv64

2025-11-08 Thread Tom Lane
... BTW, I wonder why you did not add pg_compiler_barrier_impl() to our other use of __atomic_thread_fence: #if !defined(pg_memory_barrier_impl) #if defined(HAVE_GCC__ATOMIC_INT32_CAS) #define pg_memory_barrier_impl() __atomic_thread_fence(__ATOMIC_SEQ_CST) #elif defined(__

Re: IO in wrong state on riscv64

2025-11-07 Thread Tom Lane
Thomas Munro writes: > On Sat, Nov 8, 2025 at 1:03 PM Tom Lane wrote: >> You're a brave man to be pushing this into the last-ever release >> of v13 with all of 12 hours remaining till code freeze. I don't >> mind so much for the newer branches, but I'm feeling nervous >> about the risk/reward ra

Re: IO in wrong state on riscv64

2025-11-07 Thread Thomas Munro
On Sat, Nov 8, 2025 at 1:03 PM Tom Lane wrote: > You're a brave man to be pushing this into the last-ever release > of v13 with all of 12 hours remaining till code freeze. I don't > mind so much for the newer branches, but I'm feeling nervous > about the risk/reward ratio for v13. I figured "add

Re: IO in wrong state on riscv64

2025-11-07 Thread Andres Freund
Hi, On 2025-11-07 19:03:46 -0500, Tom Lane wrote: > Thomas Munro writes: > > Alexander did some extensive testing and we stared at the codegen on a > > lot of architectures to confirm that this prevents the reordering. > > Pushed and back-patched like that. > > You're a brave man to be pushing t

Re: IO in wrong state on riscv64

2025-11-07 Thread Tom Lane
Thomas Munro writes: > Alexander did some extensive testing and we stared at the codegen on a > lot of architectures to confirm that this prevents the reordering. > Pushed and back-patched like that. You're a brave man to be pushing this into the last-ever release of v13 with all of 12 hours rema

Re: IO in wrong state on riscv64

2025-11-07 Thread Thomas Munro
On Thu, Oct 23, 2025 at 8:52 PM Thomas Munro wrote: > Oh, I think this should work better: > > #define pg_read_barrier_impl() \ > do { pg_compiler_barrier_impl(); > __atomic_thread_fence(__ATOMIC_ACQUIRE); } while (0) Alexander did some extensive testing and we stared at the codegen on a lot of a

Re: IO in wrong state on riscv64

2025-10-23 Thread Thomas Munro
On Thu, Oct 23, 2025 at 8:00 PM Alexander Lakhin wrote: > Unfortunately, this change doesn't lead to change in disassembly of > pgaio_io_wait(), produced with clang-19 -O1. That is, I'm getting the same > disassembly as the one I sent you before (error/pgaio_io_wait.asm). Oh, I think this should

Re: IO in wrong state on riscv64

2025-10-23 Thread Alexander Lakhin
Hello Thomas, 23.10.2025 00:51, Thomas Munro wrote: On a more practical note, Alexander, does this work? #if !defined(pg_read_barrier_impl) && defined(HAVE_GCC__ATOMIC_INT32_CAS) /* acquire semantics include read barrier semantics */ -# define pg_read_barrier_impl() __atomic_th

Re: IO in wrong state on riscv64

2025-10-22 Thread Thomas Munro
On Wed, Oct 22, 2025 at 11:51 PM Andres Freund wrote: > Have you figured out if the LLVM bug is specific to riscv or is more general? > Afaict most of the MachineSink stuff mentioned in the llvm bug is generic, not > arch specific. But I'd be surprised if such a problem in a more widely used > arc

Re: IO in wrong state on riscv64

2025-10-22 Thread Andres Freund
Hi, On 2025-10-20 13:05:53 +1100, Thomas Munro wrote: > Updating the list with some progress: Alexander narrowed it down to > pgaio_io_wait() being inlined into pgaio_io_was_recycled(). It seems > to have some instructions in the wrong order, and that can be > reproduced standalone, so I've aske

Re: IO in wrong state on riscv64

2025-10-19 Thread Thomas Munro
Updating the list with some progress: Alexander narrowed it down to pgaio_io_wait() being inlined into pgaio_io_was_recycled(). It seems to have some instructions in the wrong order, and that can be reproduced standalone, so I've asked about it here: https://github.com/llvm/llvm-project/issues/1

Re: IO in wrong state on riscv64

2025-10-18 Thread Greg Burd
On Sun, Oct 12, 2025, at 1:00 AM, Alexander Lakhin wrote: > Hello Thomas, > > 12.10.2025 06:35, Thomas Munro wrote: >> On Sun, Oct 12, 2025 at 2:00 AM Alexander Lakhin >> wrote: >>> 2025-10-11 11:34:46.793 GMT [1169773:1] PANIC: !!!pgaio_io_wait| >>> ioh->state changed from 0 to 1 at iteration

IO in wrong state on riscv64

2025-10-18 Thread Alexander Lakhin
Hello Andres, I've spotted one more interesting AIO-related failure on a riscv64 machine [1]: === dumping /home/gburd/build/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/regression.diffs === diff -U3 /home/gburd/build/HEAD/pgsql/src/test/regress/expected/tsearch.out /home/gburd/bui

Re: IO in wrong state on riscv64

2025-10-18 Thread Alexander Lakhin
Hello Thomas, 13.10.2025 08:40, Thomas Munro wrote: Thanks. All seems to have something plausible in the right places, but I know nothing about RISC-V... hmm, what happens if you replace pg_{read,write}_barrier() with pg_memory_barrier(), in those three functions? And if it happens to help, pe

Re: IO in wrong state on riscv64

2025-10-18 Thread Thomas Munro
On Sun, Oct 12, 2025 at 6:00 PM Alexander Lakhin wrote: > Please find those attached (gdb "disass/m pgaio_io_update_state" misses > the start of the function (but it's still disassembled below), so I > decided to share the whole output). Could you please also disassemble pgaio_io_reclaim()?

Re: IO in wrong state on riscv64

2025-10-18 Thread Alexander Lakhin
13.10.2025 01:44, Thomas Munro wrote: On Sun, Oct 12, 2025 at 6:00 PM Alexander Lakhin wrote: Please find those attached (gdb "disass/m pgaio_io_update_state" misses the start of the function (but it's still disassembled below), so I decided to share the whole output). Could you please also di

Re: IO in wrong state on riscv64

2025-10-18 Thread Thomas Munro
On Tue, Oct 14, 2025 at 5:00 PM Alexander Lakhin wrote: > The replacements doesn't work for me, unfortunately: I have 3 out of 30 > 027_stream_regress test runs failed: > 2025-10-13 21:27:26.161 UTC [4181290:5] pg_regress/brin ERROR: IO in wrong > state: 0 > 2025-10-13 21:27:26.161 UTC [4181290:

Re: IO in wrong state on riscv64

2025-10-18 Thread Alexander Lakhin
Hello Thomas, 12.10.2025 06:35, Thomas Munro wrote: On Sun, Oct 12, 2025 at 2:00 AM Alexander Lakhin wrote: 2025-10-11 11:34:46.793 GMT [1169773:1] PANIC: !!!pgaio_io_wait| ioh->state changed from 0 to 1 at iteration 0 # no other iteration number observed Can you please disassemble pgaio_io

Re: IO in wrong state on riscv64

2025-10-17 Thread Thomas Munro
On Sun, Oct 12, 2025 at 2:00 AM Alexander Lakhin wrote: > 2025-10-11 11:34:46.793 GMT [1169773:1] PANIC: !!!pgaio_io_wait| ioh->state > changed from 0 to 1 at iteration 0 > # no other iteration number observed Can you please disassemble pgaio_io_update_state() and pgaio_io_was_recycled()? I wo

Re: IO in wrong state on riscv64

2025-10-13 Thread Thomas Munro
On Mon, Oct 13, 2025 at 5:00 PM Alexander Lakhin wrote: > 13.10.2025 01:44, Thomas Munro wrote: > > On Sun, Oct 12, 2025 at 6:00 PM Alexander Lakhin > > wrote: > >> Please find those attached (gdb "disass/m pgaio_io_update_state" misses > >> the start of the function (but it's still disassembled

Re: IO in wrong state on riscv64

2025-10-13 Thread Thomas Munro
On Sun, Oct 12, 2025 at 2:00 AM Alexander Lakhin wrote: > I've managed to reproduce it using qemu-system-riscv64 with Debian trixie Huh, that's interesting. What is the host architecture? When I saw that error myself and wondered about memory order, I dismissed the idea of trying with qemu, fig