Re: Non-reproducible AIO failure

2025-05-27 Thread Tom Lane
Andres Freund writes: > I'll see if being graphically logged in somehow indeed increased the repro > rate, and if so I'll expand the debugging somewhat, or if this was just an > absurd coincidence. Hmm. Now that you mention it, the one repro on the M1 came just as I was about to give up and manu

Re: Non-reproducible AIO failure

2025-05-27 Thread Robert Haas
On Sun, May 25, 2025 at 8:25 PM Tom Lane wrote: > The fact that I can trace through this Assert failure but not the > AIO one strongly suggests some system-level problem in the latter. > There is something rotten in the state of Denmark. I have been quite frustrated with lldb on macOS for a while

Re: Non-reproducible AIO failure

2025-05-27 Thread Andres Freund
Hi, On 2025-05-27 14:43:14 -0400, Tom Lane wrote: > Andres Freund writes: > > I just meant that it seems that I can't reproduce it for some as of yet > > unknown reason. I've now been through 3k+ runs of 027_stream_regress, > > without > > a single failure, so there has to be *something* differe

Re: Non-reproducible AIO failure

2025-05-27 Thread Tom Lane
Andres Freund writes: > I just meant that it seems that I can't reproduce it for some as of yet > unknown reason. I've now been through 3k+ runs of 027_stream_regress, without > a single failure, so there has to be *something* different about my > environment than yours. > Darwin m4-dev 24.1.0 Da

Re: Non-reproducible AIO failure

2025-05-27 Thread Andres Freund
Hi, On 2025-05-27 10:12:28 -0400, Tom Lane wrote: > Andres Freund writes: > > This is on a m4 mac mini. I'm wondering if there's some hardware specific > > memory ordering issue or disk speed based timing issue that I'm just not > > hitting. > > I dunno, I've seen it on three different physical

Re: Non-reproducible AIO failure

2025-05-27 Thread Alexander Lakhin
Hello hackers, 27.05.2025 16:35, Andres Freund пишет: On 2025-05-25 20:05:49 -0400, Tom Lane wrote: Thomas Munro writes: Could you guys please share your exact repro steps? I've just been running 027_stream_regress.pl over and over. It's not a recommendable answer though because the failure p

Re: Non-reproducible AIO failure

2025-05-27 Thread Alexander Lakhin
Hello Tomas, 27.05.2025 16:26, Tomas Vondra wrote: I'm interested in how you run these tests in parallel. Can you share the patch/script? Yeah, sure. I'm running the test as follows: rm -rf src/test/recovery_*; for i in `seq 40`; do cp -r src/test/recovery/ src/test/recovery_$i/; sed -i .bak

Re: Non-reproducible AIO failure

2025-05-27 Thread Tom Lane
Andres Freund writes: > This is on a m4 mac mini. I'm wondering if there's some hardware specific > memory ordering issue or disk speed based timing issue that I'm just not > hitting. I dunno, I've seen it on three different physical machines now (one M1, two M4 Pros). But it is darn hard to re

Re: Non-reproducible AIO failure

2025-05-27 Thread Tom Lane
Thomas Munro writes: > Could you please share your configure options? The failures on indri and sifaka were during ordinary buildfarm runs, you can check the animals' details on the website. (Note those are same host machine, the difference is that indri uses some MacPorts packages while sifaka i

Re: Non-reproducible AIO failure

2025-05-27 Thread Tomas Vondra
On 5/24/25 23:00, Alexander Lakhin wrote: > ... > > I'm yet to see the Assert triggered on the buildfarm, but this one looks > interesting too. > > (I can share the complete patch + script for such testing, if it can be > helpful.) > I'm interested in how you run these tests in parallel. Can

Re: Non-reproducible AIO failure

2025-05-27 Thread Andres Freund
Hi, On 2025-05-25 20:05:49 -0400, Tom Lane wrote: > Thomas Munro writes: > > Could you guys please share your exact repro steps? > > I've just been running 027_stream_regress.pl over and over. > It's not a recommendable answer though because the failure > probability is tiny, under 1%. It sound

Re: Non-reproducible AIO failure

2025-05-27 Thread Thomas Munro
On Mon, May 26, 2025 at 12:05 PM Tom Lane wrote: > Thomas Munro writes: > > Could you guys please share your exact repro steps? > > I've just been running 027_stream_regress.pl over and over. > It's not a recommendable answer though because the failure > probability is tiny, under 1%. It sounded

Re: Non-reproducible AIO failure

2025-05-25 Thread Tom Lane
Thomas Munro writes: > On Sun, May 25, 2025 at 3:22 PM Tom Lane wrote: >> So far, I've failed to get anything useful out of core files >> from this failure. The trace goes back no further than >> (lldb) bt >> * thread #1 >> * frame #0: 0x00018de39388 libsystem_kernel.dylib`__pthread_kill + 8

Re: Non-reproducible AIO failure

2025-05-25 Thread Tom Lane
Thomas Munro writes: > Could you guys please share your exact repro steps? I've just been running 027_stream_regress.pl over and over. It's not a recommendable answer though because the failure probability is tiny, under 1%. It sounded like Alexander had a better way. re

Re: Non-reproducible AIO failure

2025-05-25 Thread Thomas Munro
On Sun, May 25, 2025 at 3:22 PM Tom Lane wrote: > Thomas Munro writes: > > Can you get a core and print *ioh in the debugger? > > So far, I've failed to get anything useful out of core files > from this failure. The trace goes back no further than > > (lldb) bt > * thread #1 > * frame #0: 0x00

Re: Non-reproducible AIO failure

2025-05-24 Thread Tom Lane
Thomas Munro writes: > Can you get a core and print *ioh in the debugger? So far, I've failed to get anything useful out of core files from this failure. The trace goes back no further than (lldb) bt * thread #1 * frame #0: 0x00018de39388 libsystem_kernel.dylib`__pthread_kill + 8 That's

Re: Non-reproducible AIO failure

2025-05-24 Thread Thomas Munro
On Sun, May 25, 2025 at 9:00 AM Alexander Lakhin wrote: > Hello Thomas, > 24.05.2025 14:42, Thomas Munro wrote: > > On Sat, May 24, 2025 at 3:17 PM Tom Lane wrote: > >> So it seems that "very low-probability issue in our Mac AIO code" is > >> the most probable description. > > There isn't any mac

Re: Non-reproducible AIO failure

2025-05-24 Thread Alexander Lakhin
Hello Thomas, 24.05.2025 14:42, Thomas Munro wrote: On Sat, May 24, 2025 at 3:17 PM Tom Lane wrote: So it seems that "very low-probability issue in our Mac AIO code" is the most probable description. There isn't any macOS-specific AIO code so my first guess would be that it might be due to aa

Re: Non-reproducible AIO failure

2025-05-24 Thread Thomas Munro
On Sat, May 24, 2025 at 3:17 PM Tom Lane wrote: > So it seems that "very low-probability issue in our Mac AIO code" is > the most probable description. There isn't any macOS-specific AIO code so my first guess would be that it might be due to aarch64 weak memory reordering (though Andres speculat

Re: Non-reproducible AIO failure

2025-05-23 Thread Tom Lane
Alexander Lakhin writes: > FWIW, that Assert have just triggered on another mac: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=indri&dt=2025-05-23%2020%3A30%3A07 Yeah, I was just looking at that too. There is a corefile from that crash, but lldb seems unable to extract anything from

Re: Non-reproducible AIO failure

2025-05-23 Thread Alexander Lakhin
Hello Tom and Andres, 24.04.2025 01:58, Tom Lane wrote: Andres Freund writes: On 2025-04-23 17:17:01 -0400, Tom Lane wrote: My buildfarm animal sifaka just failed like this [1]: There's nothing really special about sifaka, is there? I see -std=gnu99 and a few debug -D cppflags, but they don'

Re: Non-reproducible AIO failure

2025-04-23 Thread Tom Lane
Andres Freund writes: > On 2025-04-23 17:17:01 -0400, Tom Lane wrote: >> My buildfarm animal sifaka just failed like this [1]: > There's nothing really special about sifaka, is there? I see -std=gnu99 and a > few debug -D cppflags, but they don't look they could really be relevant here. No, it's

Re: Non-reproducible AIO failure

2025-04-23 Thread Andres Freund
Hi, On 2025-04-23 17:17:01 -0400, Tom Lane wrote: > My buildfarm animal sifaka just failed like this [1]: There's nothing really special about sifaka, is there? I see -std=gnu99 and a few debug -D cppflags, but they don't look they could really be relevant here. > TRAP: failed Assert("aio_ret->