Re: 011_crash_recovery.pl intermittently fails

2023-01-24 Thread Michael Paquier
On Wed, Jan 25, 2023 at 10:32:10AM +0900, Michael Paquier wrote: > Thanks, my memory was fuzzy regarding that. I am curious if the error > in the recovery tests will persist with that set up. The next run > will be in a few hours, so let's see.. So it looks like tanaget is able to reproduce the

Re: 011_crash_recovery.pl intermittently fails

2023-01-24 Thread Michael Paquier
On Tue, Jan 24, 2023 at 07:42:06PM -0500, Tom Lane wrote: > That systemd behavior affects IPC resources regardless of what process > created them. Thanks, my memory was fuzzy regarding that. I am curious if the error in the recovery tests will persist with that set up. The next run will be in a

Re: 011_crash_recovery.pl intermittently fails

2023-01-24 Thread Tom Lane
Michael Paquier writes: > On Wed, Jan 25, 2023 at 01:20:39PM +1300, Thomas Munro wrote: >> Something to do with >> https://www.postgresql.org/docs/current/kernel-resources.html#SYSTEMD-REMOVEIPC >> ? > Still this is unrelated? This is a buildfarm instance, so the backend > does not run with

Re: 011_crash_recovery.pl intermittently fails

2023-01-24 Thread Michael Paquier
On Wed, Jan 25, 2023 at 01:20:39PM +1300, Thomas Munro wrote: > Something to do with > https://www.postgresql.org/docs/current/kernel-resources.html#SYSTEMD-REMOVEIPC > ? Still this is unrelated? This is a buildfarm instance, so the backend does not run with systemd. > The failure I saw looked

Re: 011_crash_recovery.pl intermittently fails

2023-01-24 Thread Thomas Munro
On Wed, Jan 25, 2023 at 1:02 PM Michael Paquier wrote: > Well, this host has a problem, for what looks like a kernel issue, I > guess.. This is repeatable across all the branches, randomly, with > various errors with the POSIX DSM implementation: > # [63cf68b7.5e5a:1] ERROR: could not open

Re: 011_crash_recovery.pl intermittently fails

2023-01-24 Thread Michael Paquier
On Wed, Jan 25, 2023 at 12:40:02PM +1300, Thomas Munro wrote: > I remembered this thread after seeing the failure of Michael's new > build farm animal "tanager". I think we need to solve this somehow... Well, this host has a problem, for what looks like a kernel issue, I guess.. This is

Re: 011_crash_recovery.pl intermittently fails

2023-01-24 Thread Thomas Munro
On Mon, Mar 8, 2021 at 9:32 PM Kyotaro Horiguchi wrote: > At Sun, 07 Mar 2021 20:09:33 -0500, Tom Lane wrote in > > Thomas Munro writes: > > > Thanks! I'm afraid I wouldn't get around to it for a few weeks, so if > > > you have time, please do. (I'm not sure if it's strictly necessary to > >

Re: 011_crash_recovery.pl intermittently fails

2021-03-08 Thread Kyotaro Horiguchi
At Sun, 07 Mar 2021 20:09:33 -0500, Tom Lane wrote in > Thomas Munro writes: > > Thanks! I'm afraid I wouldn't get around to it for a few weeks, so if > > you have time, please do. (I'm not sure if it's strictly necessary to > > log *this* xid, if a higher xid has already been logged,

Re: 011_crash_recovery.pl intermittently fails

2021-03-07 Thread Tom Lane
Thomas Munro writes: > Thanks! I'm afraid I wouldn't get around to it for a few weeks, so if > you have time, please do. (I'm not sure if it's strictly necessary to > log *this* xid, if a higher xid has already been logged, considering > that the goal is just to avoid getting confused about an

Re: 011_crash_recovery.pl intermittently fails

2021-03-07 Thread Thomas Munro
On Mon, Mar 8, 2021 at 1:39 PM Kyotaro Horiguchi wrote: > At Fri, 05 Mar 2021 11:16:55 -0500, Tom Lane wrote in > > Kyotaro Horiguchi writes: > > But, of course, first we need a fix for the bug we now know exists. > > Was anyone volunteering to make the patch? > > Thomas' proposal sounds

Re: 011_crash_recovery.pl intermittently fails

2021-03-07 Thread Kyotaro Horiguchi
At Fri, 05 Mar 2021 11:16:55 -0500, Tom Lane wrote in > Kyotaro Horiguchi writes: > > So I think we need to remove the shared_buffers setting for the > > allows_streamig case in PostgresNode.pm > > That would have uncertain effects on other TAP tests, so I'm disinclined > to do it that way.

Re: 011_crash_recovery.pl intermittently fails

2021-03-07 Thread Kyotaro Horiguchi
At Sat, 6 Mar 2021 10:25:46 +0900, Michael Paquier wrote in > On Fri, Mar 05, 2021 at 11:16:55AM -0500, Tom Lane wrote: > > That would have uncertain effects on other TAP tests, so I'm disinclined > > to do it that way. > > +1. There may be tests out-of-core that rely on this value as >

Re: 011_crash_recovery.pl intermittently fails

2021-03-05 Thread Michael Paquier
On Fri, Mar 05, 2021 at 11:16:55AM -0500, Tom Lane wrote: > That would have uncertain effects on other TAP tests, so I'm disinclined > to do it that way. +1. There may be tests out-of-core that rely on this value as default. -- Michael signature.asc Description: PGP signature

Re: 011_crash_recovery.pl intermittently fails

2021-03-05 Thread Tom Lane
Kyotaro Horiguchi writes: > So I think we need to remove the shared_buffers setting for the > allows_streamig case in PostgresNode.pm That would have uncertain effects on other TAP tests, so I'm disinclined to do it that way. 011_crash_recovery.pl doesn't actually use a standby server, so just

Re: 011_crash_recovery.pl intermittently fails

2021-03-05 Thread Kyotaro Horiguchi
At Fri, 05 Mar 2021 16:51:17 +0900 (JST), Kyotaro Horiguchi wrote in > The difference comes from the difference of shared_buffers. In the > "allows_streaming" case, PostgresNode::init() *reduces* the number > down to '1MB'(128 blocks) which leads to only 8 XLOGbuffers, which > will very soon be

Re: 011_crash_recovery.pl intermittently fails

2021-03-05 Thread Thomas Munro
On Fri, Mar 5, 2021 at 5:40 PM Tom Lane wrote: > Thomas Munro writes: > > On Fri, Mar 5, 2021 at 5:10 PM Tom Lane wrote: > >> Alternatively, maybe we can salvage the function's usefulness by making it > >> flush WAL before returning? > > > To make pg_xact_status()'s result reliable, don't you

Re: 011_crash_recovery.pl intermittently fails

2021-03-05 Thread Kyotaro Horiguchi
At Fri, 5 Mar 2021 13:20:53 +0500, Andrey Borodin wrote in > > 5 марта 2021 г., в 13:00, Kyotaro Horiguchi > > написал(а): > > > > The problem records have 15 pages of FPIs. The reduction of their > > size may prevent WAL-buffer wrap around and wal writes. If no wal is > > written the test

Re: 011_crash_recovery.pl intermittently fails

2021-03-05 Thread Andrey Borodin
> 5 марта 2021 г., в 13:00, Kyotaro Horiguchi > написал(а): > > The problem records have 15 pages of FPIs. The reduction of their > size may prevent WAL-buffer wrap around and wal writes. If no wal is > written the test fails. Thanks, I've finally understood the root cause. So, test

Re: 011_crash_recovery.pl intermittently fails

2021-03-05 Thread Kyotaro Horiguchi
(Sorry for my slippery fingers.) At Fri, 5 Mar 2021 10:07:06 +0500, Andrey Borodin wrote in > Maybe it's offtopic here, but anyway... > While working on "lz4 for FPIs" I've noticed that this test fails with > wal_compression = on. > I did not investigate the case at that moment, but I think

Re: 011_crash_recovery.pl intermittently fails

2021-03-04 Thread Kyotaro Horiguchi
At Fri, 5 Mar 2021 10:07:06 +0500, Andrey Borodin wrote in > Maybe it's offtopic here, but anyway... > While working on "lz4 for FPIs" I've noticed that this test fails with > wal_compression = on. > I did not investigate the case at that moment, but I think that it would be > good to run

Re: 011_crash_recovery.pl intermittently fails

2021-03-04 Thread Kyotaro Horiguchi
At Thu, 04 Mar 2021 23:40:34 -0500, Tom Lane wrote in > BTW, I tried simply removing the "allows_streaming" option from > the test, and it failed ten times out of ten tries for me. > So Andres is right that that makes it pretty reproducible in > a stock build. The difference comes from the

Re: 011_crash_recovery.pl intermittently fails

2021-03-04 Thread Andrey Borodin
> 5 марта 2021 г., в 08:32, Tom Lane написал(а): > > Kyotaro Horiguchi writes: >> I noticed that 011_crash_recovery.pl intermittently (that being said, >> one out of three or so on my environment) fails in the second test. > > Hmmm ... what environment is that? This test script hasn't

Re: 011_crash_recovery.pl intermittently fails

2021-03-04 Thread Kyotaro Horiguchi
At Fri, 05 Mar 2021 13:21:48 +0900 (JST), Kyotaro Horiguchi wrote in > At Fri, 05 Mar 2021 13:13:04 +0900 (JST), Kyotaro Horiguchi > wrote in > > At Thu, 04 Mar 2021 23:02:09 -0500, Tom Lane wrote in > > > Having said that, it's still true that this test has been stable in > > > the

Re: 011_crash_recovery.pl intermittently fails

2021-03-04 Thread Tom Lane
Thomas Munro writes: > On Fri, Mar 5, 2021 at 5:10 PM Tom Lane wrote: >> Alternatively, maybe we can salvage the function's usefulness by making it >> flush WAL before returning? > To make pg_xact_status()'s result reliable, don't you need to make > pg_current_xact_id() flush? In other words,

Re: 011_crash_recovery.pl intermittently fails

2021-03-04 Thread Thomas Munro
On Fri, Mar 5, 2021 at 5:10 PM Tom Lane wrote: > I wrote: > > I'd be kind of inclined to remove this test script altogether, on the > > grounds that it's wasting cycles on a function that doesn't really > > do what is claimed (and we should remove the documentation claim, too). > > Alternatively,

Re: 011_crash_recovery.pl intermittently fails

2021-03-04 Thread Kyotaro Horiguchi
At Fri, 05 Mar 2021 13:13:04 +0900 (JST), Kyotaro Horiguchi wrote in > At Thu, 04 Mar 2021 23:02:09 -0500, Tom Lane wrote in > > Peter Geoghegan writes: > > > On Thu, Mar 4, 2021 at 7:32 PM Tom Lane wrote: > > >> Hmmm ... what environment is that? This test script hasn't changed > > >>

Re: 011_crash_recovery.pl intermittently fails

2021-03-04 Thread Kyotaro Horiguchi
At Thu, 04 Mar 2021 23:02:09 -0500, Tom Lane wrote in > Peter Geoghegan writes: > > On Thu, Mar 4, 2021 at 7:32 PM Tom Lane wrote: > >> Hmmm ... what environment is that? This test script hasn't changed > >> meaningfully in several years, and we have not seen any real issues > >> with it up

Re: 011_crash_recovery.pl intermittently fails

2021-03-04 Thread Tom Lane
I wrote: > I'd be kind of inclined to remove this test script altogether, on the > grounds that it's wasting cycles on a function that doesn't really > do what is claimed (and we should remove the documentation claim, too). Alternatively, maybe we can salvage the function's usefulness by making

Re: 011_crash_recovery.pl intermittently fails

2021-03-04 Thread Tom Lane
Peter Geoghegan writes: > On Thu, Mar 4, 2021 at 7:32 PM Tom Lane wrote: >> Hmmm ... what environment is that? This test script hasn't changed >> meaningfully in several years, and we have not seen any real issues >> with it up to now. > Did you see this recent thread? >

Re: 011_crash_recovery.pl intermittently fails

2021-03-04 Thread Peter Geoghegan
On Thu, Mar 4, 2021 at 7:32 PM Tom Lane wrote: > Hmmm ... what environment is that? This test script hasn't changed > meaningfully in several years, and we have not seen any real issues > with it up to now. Did you see this recent thread?

Re: 011_crash_recovery.pl intermittently fails

2021-03-04 Thread Tom Lane
Kyotaro Horiguchi writes: > I noticed that 011_crash_recovery.pl intermittently (that being said, > one out of three or so on my environment) fails in the second test. Hmmm ... what environment is that? This test script hasn't changed meaningfully in several years, and we have not seen any real

011_crash_recovery.pl intermittently fails

2021-03-04 Thread Kyotaro Horiguchi
Hello. I noticed that 011_crash_recovery.pl intermittently (that being said, one out of three or so on my environment) fails in the second test. > t/011_crash_recovery.pl .. 2/3 > # Failed test 'new xid after restart is greater' > # at t/011_crash_recovery.pl line 56. > # '539' > #