Re: WIP: WAL prefetch (another approach)

2022-09-24 Thread Thomas Munro
On Wed, Apr 13, 2022 at 8:05 AM Thomas Munro wrote: > On Wed, Apr 13, 2022 at 3:57 AM Dagfinn Ilmari Mannsåker > wrote: > > Simon Riggs writes: > > > This is a nice feature if it is safe to turn off full_page_writes. > > > When is it safe to do that? On which platform? > > > > > > I am not

Re: pg15b3: recovery fails with wal prefetch enabled

2022-09-08 Thread Thomas Munro
On Wed, Sep 7, 2022 at 1:56 AM Jonathan S. Katz wrote: > To close this loop, I added a section for "fixed before RC1" to Open > Items since this is presumably the next release. We can include it there > once committed. Done yesterday. To tie up a couple of loose ends from this thread: On Thu,

Re: pg15b3: recovery fails with wal prefetch enabled

2022-09-06 Thread Jonathan S. Katz
On 9/5/22 10:03 PM, Thomas Munro wrote: On Tue, Sep 6, 2022 at 1:51 PM Tom Lane wrote: "Jonathan S. Katz" writes: On 9/5/22 7:18 PM, Thomas Munro wrote: Well I was about to commit this, but beta4 just got stamped (but not yet tagged). I see now that Jonathan (with RMT hat on, CC'd) meant

Re: pg15b3: recovery fails with wal prefetch enabled

2022-09-05 Thread Thomas Munro
On Tue, Sep 6, 2022 at 1:51 PM Tom Lane wrote: > "Jonathan S. Katz" writes: > > On 9/5/22 7:18 PM, Thomas Munro wrote: > >> Well I was about to commit this, but beta4 just got stamped (but not > >> yet tagged). I see now that Jonathan (with RMT hat on, CC'd) meant > >> commits should be in by

Re: pg15b3: recovery fails with wal prefetch enabled

2022-09-05 Thread Tom Lane
"Jonathan S. Katz" writes: > On 9/5/22 7:18 PM, Thomas Munro wrote: >> Well I was about to commit this, but beta4 just got stamped (but not >> yet tagged). I see now that Jonathan (with RMT hat on, CC'd) meant >> commits should be in by the *start* of the 5th AoE, not the end. So >> the

Re: pg15b3: recovery fails with wal prefetch enabled

2022-09-05 Thread Jonathan S. Katz
On 9/5/22 7:18 PM, Thomas Munro wrote: On Mon, Sep 5, 2022 at 9:08 PM Thomas Munro wrote: At Mon, 05 Sep 2022 14:15:27 +0900 (JST), Kyotaro Horiguchi wrote in At Mon, 5 Sep 2022 16:54:07 +1200, Thomas Munro wrote in On reflection, it'd be better not to clobber any pre-existing error

Re: pg15b3: recovery fails with wal prefetch enabled

2022-09-05 Thread Kyotaro Horiguchi
At Mon, 5 Sep 2022 21:08:16 +1200, Thomas Munro wrote in > We also need the LSN that is past that record. > XLogReleasePreviousRecord() could return it (or we could use > reader->EndRecPtr I suppose). Thoughts on this version? (Catching the gap...) It is easier to read. Thanks! regards. --

Re: pg15b3: recovery fails with wal prefetch enabled

2022-09-05 Thread Thomas Munro
On Mon, Sep 5, 2022 at 9:08 PM Thomas Munro wrote: > > At Mon, 05 Sep 2022 14:15:27 +0900 (JST), Kyotaro Horiguchi > > wrote in > > At Mon, 5 Sep 2022 16:54:07 +1200, Thomas Munro > > wrote in > > > On reflection, it'd be better not to clobber any pre-existing error > > > there, but report

Re: pg15b3: recovery fails with wal prefetch enabled

2022-09-05 Thread Thomas Munro
On Mon, Sep 5, 2022 at 5:34 PM Kyotaro Horiguchi wrote: > At Mon, 05 Sep 2022 14:15:27 +0900 (JST), Kyotaro Horiguchi > wrote in > me> +1 for showing any message for the failure, but I think we shouldn't > me> hide an existing message if any. > > At Mon, 5 Sep 2022 16:54:07 +1200, Thomas Munro

Re: pg15b3: recovery fails with wal prefetch enabled

2022-09-04 Thread Kyotaro Horiguchi
(the previous mail was crossing with yours..) At Mon, 05 Sep 2022 14:15:27 +0900 (JST), Kyotaro Horiguchi wrote in me> +1 for showing any message for the failure, but I think we shouldn't me> hide an existing message if any. At Mon, 5 Sep 2022 16:54:07 +1200, Thomas Munro wrote in > On

Re: pg15b3: recovery fails with wal prefetch enabled

2022-09-04 Thread Kyotaro Horiguchi
At Mon, 5 Sep 2022 13:28:12 +1200, Thomas Munro wrote in > I had this more or less figured out on Friday when I wrote last, but I > got stuck on a weird problem with 026_overwrite_contrecord.pl. I > think that failure case should report an error, no? I find it strange > that we end recovery

Re: pg15b3: recovery fails with wal prefetch enabled

2022-09-04 Thread Thomas Munro
On Mon, Sep 5, 2022 at 1:28 PM Thomas Munro wrote: > I had this more or less figured out on Friday when I wrote last, but I > got stuck on a weird problem with 026_overwrite_contrecord.pl. I > think that failure case should report an error, no? I find it strange > that we end recovery in

Re: pg15b3: recovery fails with wal prefetch enabled

2022-09-04 Thread Thomas Munro
On Fri, Sep 2, 2022 at 6:20 PM Thomas Munro wrote: > ... The active ingredient here is a setting of > maintenance_io_concurency=0, which runs into a dumb accounting problem > of the fencepost variety and incorrectly concludes it's reached the > end early. Setting it to 3 or higher allows his

Re: pg15b3: recovery fails with wal prefetch enabled

2022-09-02 Thread Thomas Munro
On Thu, Sep 1, 2022 at 11:18 PM Thomas Munro wrote: > Ahh, problem repro'd here with WAL compression. More soon. I followed some false pistes for a while there, but I finally figured it out what's happening here after Justin kindly shared some files with me. The active ingredient here is a

Re: pg15b3: recovery fails with wal prefetch enabled

2022-09-01 Thread Thomas Munro
On Thu, Sep 1, 2022 at 5:52 PM Justin Pryzby wrote: > compression method: zstd Ahh, problem repro'd here with WAL compression. More soon.

Re: pg15b3: recovery fails with wal prefetch enabled

2022-08-31 Thread Justin Pryzby
On Thu, Sep 01, 2022 at 05:35:23PM +1200, Thomas Munro wrote: > So it *looks* like it finished early (and without the expected > error?). But it also looks like it replayed that record, according to > the page LSN. So which is it? Could you recompile with WAL_DEBUG > defined in

Re: pg15b3: recovery fails with wal prefetch enabled

2022-08-31 Thread Thomas Munro
On Thu, Sep 1, 2022 at 5:18 PM Kyotaro Horiguchi wrote: > At Wed, 31 Aug 2022 23:47:53 -0500, Justin Pryzby > wrote in > > On Thu, Sep 01, 2022 at 04:22:20PM +1200, Thomas Munro wrote: > > > Hmm. Justin, when you built from source, which commit were you at? > > > If it's REL_15_BETA3, > > > >

Re: pg15b3: recovery fails with wal prefetch enabled

2022-08-31 Thread Kyotaro Horiguchi
At Wed, 31 Aug 2022 23:47:53 -0500, Justin Pryzby wrote in > On Thu, Sep 01, 2022 at 04:22:20PM +1200, Thomas Munro wrote: > > On Thu, Sep 1, 2022 at 3:08 PM Kyotaro Horiguchi > > wrote: > > > Just for information, there was a fixed bug about > > > overwrite-aborted-contrecord feature, which

Re: pg15b3: recovery fails with wal prefetch enabled

2022-08-31 Thread Justin Pryzby
On Thu, Sep 01, 2022 at 04:22:20PM +1200, Thomas Munro wrote: > On Thu, Sep 1, 2022 at 3:08 PM Kyotaro Horiguchi > wrote: > > At Thu, 1 Sep 2022 12:05:36 +1200, Thomas Munro > > wrote in > > > On Thu, Sep 1, 2022 at 2:01 AM Justin Pryzby wrote: > > > > < 2022-08-31 08:44:10.495 CDT >LOG:

Re: pg15b3: recovery fails with wal prefetch enabled

2022-08-31 Thread Thomas Munro
On Thu, Sep 1, 2022 at 3:08 PM Kyotaro Horiguchi wrote: > At Thu, 1 Sep 2022 12:05:36 +1200, Thomas Munro > wrote in > > On Thu, Sep 1, 2022 at 2:01 AM Justin Pryzby wrote: > > > < 2022-08-31 08:44:10.495 CDT >LOG: checkpoint starting: > > > end-of-recovery immediate wait > > > < 2022-08-31

Re: pg15b3: recovery fails with wal prefetch enabled

2022-08-31 Thread Kyotaro Horiguchi
At Thu, 1 Sep 2022 12:05:36 +1200, Thomas Munro wrote in > On Thu, Sep 1, 2022 at 2:01 AM Justin Pryzby wrote: > > < 2022-08-31 08:44:10.495 CDT >LOG: checkpoint starting: end-of-recovery > > immediate wait > > < 2022-08-31 08:44:10.609 CDT >LOG: request to flush past end of > >

Re: pg15b3: recovery fails with wal prefetch enabled

2022-08-31 Thread Justin Pryzby
Some more details, in case they're important: First: the server has wal_compression=zstd (I wonder if something doesn't allow/accomodate compressed FPI?) I thought to mention that after compiling pg15 locally and forgetting to use --with-zstd. I compiled it to enable your debug logging, which

Re: pg15b3: recovery fails with wal prefetch enabled

2022-08-31 Thread Thomas Munro
On Thu, Sep 1, 2022 at 12:53 PM Justin Pryzby wrote: > Yes, I have a copy that reproduces the issue: That's good news. So the last record touching that page was: > rmgr: Heap2 len (rec/tot): 59/59, tx: 0, lsn: > 1201/1CAF84B0, prev 1201/1CAF8478, desc: VISIBLE cutoff

Re: pg15b3: recovery fails with wal prefetch enabled

2022-08-31 Thread Justin Pryzby
On Thu, Sep 01, 2022 at 12:05:36PM +1200, Thomas Munro wrote: > On Thu, Sep 1, 2022 at 2:01 AM Justin Pryzby wrote: > > < 2022-08-31 08:44:10.495 CDT >LOG: checkpoint starting: end-of-recovery > > immediate wait > > < 2022-08-31 08:44:10.609 CDT >LOG: request to flush past end of > >

Re: pg15b3: recovery fails with wal prefetch enabled

2022-08-31 Thread Thomas Munro
On Thu, Sep 1, 2022 at 2:01 AM Justin Pryzby wrote: > < 2022-08-31 08:44:10.495 CDT >LOG: checkpoint starting: end-of-recovery > immediate wait > < 2022-08-31 08:44:10.609 CDT >LOG: request to flush past end of generated > WAL; request 1201/1CAF84F0, current position 1201/1CADB730 > <

pg15b3: recovery fails with wal prefetch enabled

2022-08-31 Thread Justin Pryzby
An internal VM crashed last night due to OOM. When I tried to start postgres, it failed like: < 2022-08-31 08:44:10.495 CDT >LOG: checkpoint starting: end-of-recovery immediate wait < 2022-08-31 08:44:10.609 CDT >LOG: request to flush past end of generated WAL; request 1201/1CAF84F0,

Re: WIP: WAL prefetch (another approach)

2022-04-26 Thread Thomas Munro
On Tue, Apr 26, 2022 at 6:11 PM Thomas Munro wrote: > I will poke some more tomorrow to try to confirm this and try to come > up with a fix. Done, and moved over to the pg_walinspect commit thread to reach the right eyeballs:

Re: WIP: WAL prefetch (another approach)

2022-04-26 Thread Thomas Munro
On Tue, Apr 26, 2022 at 6:11 AM Tom Lane wrote: > I believe that the WAL prefetch patch probably accounts for the > intermittent errors that buildfarm member topminnow has shown > since it went in, eg [1]: > > diff -U3 > /home/nm/ext4/HEAD/pgsql/contrib/pg_walinspect/expected

Re: WIP: WAL prefetch (another approach)

2022-04-25 Thread Tom Lane
Oh, one more bit of data: here's an excerpt from pg_waldump output after the failed test: rmgr: Btree len (rec/tot): 72/72, tx:727, lsn: 0/01903BC8, prev 0/01903B70, desc: INSERT_LEAF off 111, blkref #0: rel 1663/16384/2673 blk 9 rmgr: Btree len (rec/tot): 72/

Re: WIP: WAL prefetch (another approach)

2022-04-25 Thread Tom Lane
I believe that the WAL prefetch patch probably accounts for the intermittent errors that buildfarm member topminnow has shown since it went in, eg [1]: diff -U3 /home/nm/ext4/HEAD/pgsql/contrib/pg_walinspect/expected/pg_walinspect.out /home/nm/ext4/HEAD/pgsql.build/contrib/pg_walinspect/results

Re: WIP: WAL prefetch (another approach)

2022-04-12 Thread Thomas Munro
On Wed, Apr 13, 2022 at 3:57 AM Dagfinn Ilmari Mannsåker wrote: > Simon Riggs writes: > > This is a nice feature if it is safe to turn off full_page_writes. As other have said/shown, it does also help if a block with FPW is evicted and then read back in during one checkpoint cycle, in other

Re: WIP: WAL prefetch (another approach)

2022-04-12 Thread SATYANARAYANA NARLAPURAM
to memory in advance when they are evicted. This speeds up the replay and is cost effective. 2/ Allows larger checkpoint_timeout for the same recovery SLA and perhaps improved performance? 3/ WAL prefetch (not pages by itself) can improve replay by itself (not sure if it was measured in isolation, To

Re: WIP: WAL prefetch (another approach)

2022-04-12 Thread Tomas Vondra
On 4/12/22 17:46, Simon Riggs wrote: > On Tue, 12 Apr 2022 at 16:41, Tomas Vondra > wrote: >> >> On 4/12/22 15:58, Simon Riggs wrote: >>> On Thu, 7 Apr 2022 at 08:46, Thomas Munro wrote: >>> With that... I've finally pushed the 0002 patch and will be watching the build farm. >>> >>>

Re: WIP: WAL prefetch (another approach)

2022-04-12 Thread Dagfinn Ilmari Mannsåker
Simon Riggs writes: > On Thu, 7 Apr 2022 at 08:46, Thomas Munro wrote: > >> With that... I've finally pushed the 0002 patch and will be watching >> the build farm. > > This is a nice feature if it is safe to turn off full_page_writes. > > When is it safe to do that? On which platform? > > I am

Re: WIP: WAL prefetch (another approach)

2022-04-12 Thread Simon Riggs
On Tue, 12 Apr 2022 at 16:41, Tomas Vondra wrote: > > On 4/12/22 15:58, Simon Riggs wrote: > > On Thu, 7 Apr 2022 at 08:46, Thomas Munro wrote: > > > >> With that... I've finally pushed the 0002 patch and will be watching > >> the build farm. > > > > This is a nice feature if it is safe to turn

Re: WIP: WAL prefetch (another approach)

2022-04-12 Thread Tomas Vondra
On 4/12/22 15:58, Simon Riggs wrote: > On Thu, 7 Apr 2022 at 08:46, Thomas Munro wrote: > >> With that... I've finally pushed the 0002 patch and will be watching >> the build farm. > > This is a nice feature if it is safe to turn off full_page_writes. > > When is it safe to do that? On which

Re: WIP: WAL prefetch (another approach)

2022-04-12 Thread Simon Riggs
On Thu, 7 Apr 2022 at 08:46, Thomas Munro wrote: > With that... I've finally pushed the 0002 patch and will be watching > the build farm. This is a nice feature if it is safe to turn off full_page_writes. When is it safe to do that? On which platform? I am not aware of any released software

RE: WIP: WAL prefetch (another approach)

2022-04-12 Thread Shinoda, Noriyoshi (PN Japan FSIP)
; Alvaro Herrera ; Tomas Vondra ; Dmitry Dolgov <9erthali...@gmail.com>; David Steele ; pgsql-hackers Subject: Re: WIP: WAL prefetch (another approach) On Tue, Apr 12, 2022 at 9:03 PM Shinoda, Noriyoshi (PN Japan FSIP) wrote: > Thank you for developing the great feature. I tested thi

Re: WIP: WAL prefetch (another approach)

2022-04-12 Thread Thomas Munro
On Tue, Apr 12, 2022 at 9:03 PM Shinoda, Noriyoshi (PN Japan FSIP) wrote: > Thank you for developing the great feature. I tested this feature and checked > the documentation. Currently, the documentation for the > pg_stat_prefetch_recovery view is included in the description for the >

RE: WIP: WAL prefetch (another approach)

2022-04-12 Thread Shinoda, Noriyoshi (PN Japan FSIP)
--Original Message- From: Thomas Munro Sent: Friday, April 8, 2022 10:47 AM To: Justin Pryzby Cc: Tomas Vondra ; Stephen Frost ; Andres Freund ; Jakub Wartak ; Alvaro Herrera ; Tomas Vondra ; Dmitry Dolgov <9erthali...@gmail.com>; David Steele ; pgsql-hackers Subject: Re: WIP: WAL p

Re: WIP: WAL prefetch (another approach)

2022-04-07 Thread Thomas Munro
On Fri, Apr 8, 2022 at 12:55 AM Justin Pryzby wrote: > The docs seem to be wrong about the default. > > +are not yet in the buffer pool, during recovery. Valid values are > +off (the default), on and > +try. The setting try enables Fixed. > + concurrency and

Re: WIP: WAL prefetch (another approach)

2022-04-07 Thread Justin Pryzby
The docs seem to be wrong about the default. +are not yet in the buffer pool, during recovery. Valid values are +off (the default), on and +try. The setting try enables + concurrency and distance, respectively. By default, it is set to + try, which enabled the

Re: WIP: WAL prefetch (another approach)

2022-04-07 Thread Thomas Munro
On Mon, Apr 4, 2022 at 3:12 PM Julien Rouhaud wrote: > [review] Thanks! I took almost all of your suggestions about renaming things, comments, docs and moving a magic number into a macro. Minor changes: 1. Rebased over the shmem stats changes and others that have just landed today (woo!).

Re: WIP: WAL prefetch (another approach)

2022-04-03 Thread Julien Rouhaud
On Thu, Mar 31, 2022 at 10:49:32PM +1300, Thomas Munro wrote: > On Mon, Mar 21, 2022 at 9:29 PM Julien Rouhaud wrote: > > So I finally finished looking at this patch. Here again, AFAICS the > > feature is > > working as expected and I didn't find any problem. I just have some minor > >

Re: WIP: WAL prefetch (another approach)

2022-03-31 Thread Thomas Munro
On Mon, Mar 21, 2022 at 9:29 PM Julien Rouhaud wrote: > So I finally finished looking at this patch. Here again, AFAICS the feature > is > working as expected and I didn't find any problem. I just have some minor > comments, like for the previous patch. Thanks very much for the review. I've

Re: WIP: WAL prefetch (another approach)

2022-03-21 Thread Julien Rouhaud
Hi, On Sun, Mar 20, 2022 at 05:36:38PM +1300, Thomas Munro wrote: > On Fri, Mar 18, 2022 at 9:59 AM Thomas Munro wrote: > > I'll push 0001 today to let the build farm chew on it for a few days > > before moving to 0002. > > Clearly 018_wal_optimize.pl is flapping and causing recoveryCheck to >

Re: WIP: WAL prefetch (another approach)

2022-03-20 Thread Thomas Munro
On Sun, Mar 20, 2022 at 5:36 PM Thomas Munro wrote: > Clearly 018_wal_optimize.pl is flapping Correction, 019_replslot_limit.pl, discussed at https://www.postgresql.org/message-id/flat/83b46e5f-2a52-86aa-fa6c-8174908174b8%40iki.fi .

Re: WIP: WAL prefetch (another approach)

2022-03-19 Thread Thomas Munro
On Fri, Mar 18, 2022 at 9:59 AM Thomas Munro wrote: > I'll push 0001 today to let the build farm chew on it for a few days > before moving to 0002. Clearly 018_wal_optimize.pl is flapping and causing recoveryCheck to fail occasionally, but that predates the above commit. I didn't follow the

Re: WIP: WAL prefetch (another approach)

2022-03-17 Thread Thomas Munro
On Mon, Mar 14, 2022 at 8:17 PM Julien Rouhaud wrote: > Great! I'm happy with 0001 and I think it's good to go! I'll push 0001 today to let the build farm chew on it for a few days before moving to 0002.

Re: WIP: WAL prefetch (another approach)

2022-03-14 Thread Julien Rouhaud
On Mon, Mar 14, 2022 at 06:15:59PM +1300, Thomas Munro wrote: > On Fri, Mar 11, 2022 at 9:27 PM Julien Rouhaud wrote: > > > > Also, is it worth an assert (likely at the top of the function) for > > > > that? > > > > > > How could I assert that EndRecPtr has the right value? > > > > Sorry, I

Re: WIP: WAL prefetch (another approach)

2022-03-11 Thread Julien Rouhaud
On Fri, Mar 11, 2022 at 06:31:13PM +1300, Thomas Munro wrote: > On Wed, Mar 9, 2022 at 7:47 PM Julien Rouhaud wrote: > > > > This could use XLogRecGetBlock? Note that this macro is for now never used. > > xlogreader.c also has some similar forgotten code that could use > > XLogRecMaxBlockId. > >

Re: WIP: WAL prefetch (another approach)

2022-03-10 Thread Andres Freund
On March 10, 2022 9:31:13 PM PST, Thomas Munro wrote: > The other thing I need to change is that I should turn on >recovery_prefetch for platforms that support it (ie Linux and maybe >NetBSD only for now), in the tests. Could a setting of "try" make sense? -- Sent from my Android device

Re: WIP: WAL prefetch (another approach)

2022-03-10 Thread Thomas Munro
On Fri, Mar 11, 2022 at 6:31 PM Thomas Munro wrote: > Thanks for your review of 0001! It gave me a few things to think > about and some good improvements. And just in case it's useful, here's what changed between v21 and v22.. diff --git a/src/backend/access/transam/xlogreader.c

Re: WIP: WAL prefetch (another approach)

2022-03-08 Thread Julien Rouhaud
Hi, On Tue, Mar 08, 2022 at 06:15:43PM +1300, Thomas Munro wrote: > On Wed, Dec 29, 2021 at 5:29 PM Thomas Munro wrote: > > https://github.com/macdice/postgres/tree/recovery-prefetch-ii > > Here's a rebase. This mostly involved moving hunks over to the new > xlogrecovery.c file. One thing that

Re: WIP: WAL prefetch (another approach)

2022-03-08 Thread Andres Freund
Hi, On 2022-03-08 18:15:43 +1300, Thomas Munro wrote: > I'm now starting to think about committing this soon. +1 Are you thinking of committing both patches at once, or with a bit of distance? I think something in the regression tests ought to enable recovery_prefetch. 027_stream_regress or

Re: WIP: WAL prefetch (another approach)

2022-03-08 Thread Tomas Vondra
On 3/8/22 06:15, Thomas Munro wrote: > On Wed, Dec 29, 2021 at 5:29 PM Thomas Munro wrote: >> https://github.com/macdice/postgres/tree/recovery-prefetch-ii > > Here's a rebase. This mostly involved moving hunks over to the new > xlogrecovery.c file. One thing that seemed a little strange to

Re: WIP: WAL prefetch (another approach)

2021-12-29 Thread Andres Freund
Hi, On 2021-12-29 17:29:52 +1300, Thomas Munro wrote: > > FWIW I don't think we include updates to typedefs.list in patches. > > Seems pretty harmless? And useful to keep around in development > branches because I like to pgindent stuff... I think it's even helpful. As long as it's done with a

Re: WIP: WAL prefetch (another approach)

2021-12-28 Thread Tom Lane
Thomas Munro writes: >> FWIW I don't think we include updates to typedefs.list in patches. > Seems pretty harmless? And useful to keep around in development > branches because I like to pgindent stuff... As far as that goes, my habit is to pull down

Re: WIP: WAL prefetch (another approach)

2021-12-17 Thread Tom Lane
Greg Stark writes: > But the bigger question is. Are we really concerned about this flaky > problem? Is it worth investing time and money on? I can get money to > go buy a G4 or G5 and spend some time on it. It just seems a bit... > niche. But if it's a real bug that represents something broken

Re: WIP: WAL prefetch (another approach)

2021-12-17 Thread Greg Stark
On Fri, 17 Dec 2021 at 18:40, Tom Lane wrote: > > Greg Stark writes: > > Hm. I seem to have picked a bad checkout. I took the last one before > > the revert (45aa88fe1d4028ea50ba7d26d390223b6ef78acc). > > FWIW, I think that's the first one *after* the revert. Doh But the bigger question is.

Re: WIP: WAL prefetch (another approach)

2021-12-17 Thread Tom Lane
Greg Stark writes: > Hm. I seem to have picked a bad checkout. I took the last one before > the revert (45aa88fe1d4028ea50ba7d26d390223b6ef78acc). FWIW, I think that's the first one *after* the revert. > 2021-12-17 17:51:51.688 EST [50955] LOG: background worker "parallel > worker" (PID 54073)

Re: WIP: WAL prefetch (another approach)

2021-12-17 Thread Tomas Vondra
On 12/17/21 23:56, Greg Stark wrote: Hm. I seem to have picked a bad checkout. I took the last one before the revert (45aa88fe1d4028ea50ba7d26d390223b6ef78acc). Or there's some incompatibility with the emulation and the IPC stuff parallel workers use. 2021-12-17 17:51:51.688 EST [50955] LOG:

Re: WIP: WAL prefetch (another approach)

2021-12-17 Thread Greg Stark
Hm. I seem to have picked a bad checkout. I took the last one before the revert (45aa88fe1d4028ea50ba7d26d390223b6ef78acc). Or there's some incompatibility with the emulation and the IPC stuff parallel workers use. 2021-12-17 17:51:51.688 EST [50955] LOG: background worker "parallel worker"

Re: WIP: WAL prefetch (another approach)

2021-12-17 Thread Tom Lane
Greg Stark writes: > I'm guessing I should do CC=/usr/bin/powerpc-apple-darwin9-gcc-4.2.1 > or maybe 4.0.1. What version is on your G4? $ gcc -v Using built-in specs. Target: powerpc-apple-darwin9 Configured with: /var/tmp/gcc/gcc-5493~1/src/configure --disable-checking -enable-werror

Re: WIP: WAL prefetch (another approach)

2021-12-17 Thread Greg Stark
I have IBUILD:postgresql gsstark$ ls /usr/bin/*gcc* /usr/bin/gcc /usr/bin/gcc-4.0 /usr/bin/gcc-4.2 /usr/bin/i686-apple-darwin9-gcc-4.0.1 /usr/bin/i686-apple-darwin9-gcc-4.2.1 /usr/bin/powerpc-apple-darwin9-gcc-4.0.1 /usr/bin/powerpc-apple-darwin9-gcc-4.2.1 I'm guessing I should do

Re: WIP: WAL prefetch (another approach)

2021-12-17 Thread Tom Lane
Greg Stark writes: > What tools and tool versions are you using to build? Is it just GCC for PPC? > There aren't any special build processes to make a fat binary involved? Nope, just "configure; make" using that macOS version's regular gcc. regards, tom lane

Re: WIP: WAL prefetch (another approach)

2021-12-17 Thread Greg Stark
What tools and tool versions are you using to build? Is it just GCC for PPC? There aren't any special build processes to make a fat binary involved? On Thu, 16 Dec 2021 at 23:11, Tom Lane wrote: > > Greg Stark writes: > > But if you're interested and can explain the tests to run I can try to >

Re: WIP: WAL prefetch (another approach)

2021-12-16 Thread Tom Lane
Greg Stark writes: > But if you're interested and can explain the tests to run I can try to > get the tests running on this machine: I'm not sure that machine is close enough to prove much, but by all means give it a go if you wish. My test setup was explained in [1]: >> To recap, the test

Re: WIP: WAL prefetch (another approach)

2021-12-16 Thread Greg Stark
The actual hardware of this machine is a Mac Mini Core 2 Duo. I'm not really clear how the emulation is done and whether it makes a reasonable test environment or not. Hardware Overview: Model Name: Mac mini Model Identifier: Macmini2,1 Processor Name: Intel Core 2 Duo

Re: WIP: WAL prefetch (another approach)

2021-12-16 Thread Greg Stark
On Fri, 26 Nov 2021 at 21:47, Tom Lane wrote: > > Yeah ... on the one hand, that machine has shown signs of > hard-to-reproduce flakiness, so it's easy to write off the failures > I saw as hardware issues. On the other hand, the flakiness I've > seen has otherwise manifested as kernel crashes,

Re: WIP: WAL prefetch (another approach)

2021-12-13 Thread Robert Haas
On Fri, Nov 26, 2021 at 9:47 PM Tom Lane wrote: > Yeah ... on the one hand, that machine has shown signs of > hard-to-reproduce flakiness, so it's easy to write off the failures > I saw as hardware issues. On the other hand, the flakiness I've > seen has otherwise manifested as kernel crashes,

Re: WIP: WAL prefetch (another approach)

2021-12-10 Thread Ashutosh Sharma
Hi Thomas, I am unable to apply these new set of patches on HEAD. Can you please share the rebased patch or if you have any work branch can you please point it out, I will refer to it for the changes. -- With Regards, Ashutosh sharma. On Tue, Nov 23, 2021 at 3:44 PM Thomas Munro wrote: > On

Re: WIP: WAL prefetch (another approach)

2021-11-26 Thread Tom Lane
Thomas Munro writes: > On Sat, Nov 27, 2021 at 12:34 PM Tomas Vondra > wrote: >> One thing that's not clear to me is what happened to the reasons why >> this feature was reverted in the PG14 cycle? > 3. A wild goose chase for bugs on Tom Lane's antique 32 bit PPC > machine. Tom eventually

Re: WIP: WAL prefetch (another approach)

2021-11-26 Thread Thomas Munro
On Sat, Nov 27, 2021 at 12:34 PM Tomas Vondra wrote: > One thing that's not clear to me is what happened to the reasons why > this feature was reverted in the PG14 cycle? Reasons for reverting: 1. A bug in commit 323cbe7c, "Remove read_page callback from XLogReader.". I couldn't easily revert

Re: WIP: WAL prefetch (another approach)

2021-11-26 Thread Tomas Vondra
On 11/26/21 22:16, Thomas Munro wrote: On Fri, Nov 26, 2021 at 11:32 AM Tomas Vondra wrote: The results are pretty good / similar to previous results. Replaying the 1h worth of work on a smaller machine takes ~5:30h without prefetching (master or with prefetching disabled). With prefetching

Re: WIP: WAL prefetch (another approach)

2021-11-26 Thread Thomas Munro
On Fri, Nov 26, 2021 at 11:32 AM Tomas Vondra wrote: > The results are pretty good / similar to previous results. Replaying the > 1h worth of work on a smaller machine takes ~5:30h without prefetching > (master or with prefetching disabled). With prefetching enabled this > drops to ~2h (default

Re: WIP: WAL prefetch (another approach)

2021-11-25 Thread Tomas Vondra
Hi, It's great you posted a new version of this patch, so I took a look a brief look at it. The code seems in pretty good shape, I haven't found any real issues - just two minor comments: This seems a bit strange: #define DEFAULT_DECODE_BUFFER_SIZE 0x1 Why not to define this as a

Re: WIP: WAL prefetch (another approach)

2021-11-15 Thread Daniel Gustafsson
> On 10 May 2021, at 06:11, Thomas Munro wrote: > On Thu, Apr 22, 2021 at 11:22 AM Stephen Frost wrote: >> I tend to agree with the idea to revert it, perhaps a +0 on that, but if >> others argue it should be fixed in-place, I wouldn’t complain about it. > > Reverted. > > Note: eelpout may

Re: WIP: WAL prefetch (another approach)

2021-05-09 Thread Thomas Munro
On Thu, Apr 22, 2021 at 11:22 AM Stephen Frost wrote: > On Wed, Apr 21, 2021 at 19:17 Thomas Munro wrote: >> On Thu, Apr 22, 2021 at 8:16 AM Thomas Munro wrote: >> ... Personally I think the right thing to do now is to revert it >> and re-propose for 15 early in the cycle, supported with some

Re: WIP: WAL prefetch (another approach)

2021-05-06 Thread Andres Freund
Hi, On 2021-05-04 18:08:35 -0700, Andres Freund wrote: > But the issue that 70b4f82a4b is trying to address seems bigger to > me. The reason it's so easy to hit the issue is that walreceiver does < > 8KB writes into recycled WAL segments *without* zero-filling the tail > end of the page - which

Re: WIP: WAL prefetch (another approach)

2021-05-06 Thread Andres Freund
Hi, On 2021-05-04 09:46:12 -0400, Tom Lane wrote: > Yeah, I have also spent a fair amount of time trying to reproduce it > elsewhere, without success so far. Notably, I've been trying on a > PPC Mac laptop that has a fairly similar CPU to what's in the G4, > though a far slower disk drive. So

Re: WIP: WAL prefetch (another approach)

2021-05-04 Thread Andres Freund
Hi, On 2021-05-04 15:47:41 -0400, Tom Lane wrote: > BTW, that conclusion shouldn't distract us from the very real bug > that Andres identified. I was just scraping the buildfarm logs > concerning recent failures, and I found several recent cases > that match the symptom he reported: > [...] >

Re: WIP: WAL prefetch (another approach)

2021-05-04 Thread Tom Lane
I wrote: > I suppose that if we're unable to reproduce it on at least one other box, > we have to write it off as hardware flakiness. BTW, that conclusion shouldn't distract us from the very real bug that Andres identified. I was just scraping the buildfarm logs concerning recent failures, and I

Re: WIP: WAL prefetch (another approach)

2021-05-04 Thread Tom Lane
Tomas Vondra writes: > On 5/3/21 7:42 AM, Thomas Munro wrote: >> Hmm, yeah that does seem plausible. It would be nice to see a report >> from any other system though. I'm still trying, and reviewing... > FWIW I've ran the test (make installcheck-parallel in a loop) on four > different

Re: WIP: WAL prefetch (another approach)

2021-05-04 Thread Tomas Vondra
On 5/3/21 7:42 AM, Thomas Munro wrote: On Sun, May 2, 2021 at 3:16 PM Tom Lane wrote: That last point means that there was some hard-to-hit problem even before any of the recent WAL-related changes. However, 323cbe7c7 (Remove read_page callback from XLogReader) increased the failure rate

Re: WIP: WAL prefetch (another approach)

2021-05-02 Thread Thomas Munro
On Sun, May 2, 2021 at 3:16 PM Tom Lane wrote: > That last point means that there was some hard-to-hit problem even > before any of the recent WAL-related changes. However, 323cbe7c7 > (Remove read_page callback from XLogReader) increased the failure > rate by at least a factor of 5, and

Re: WIP: WAL prefetch (another approach)

2021-05-02 Thread Thomas Munro
On Thu, Apr 29, 2021 at 12:24 PM Tom Lane wrote: > Andres Freund writes: > > On 2021-04-28 19:24:53 -0400, Tom Lane wrote: > >> IOW, we've spent over twice as many CPU cycles shipping data to the > >> standby as we did in applying the WAL on the standby. > > > I don't really know how the time

Re: WIP: WAL prefetch (another approach)

2021-05-01 Thread Tom Lane
Thomas Munro writes: > On Thu, Apr 29, 2021 at 4:45 AM Tom Lane wrote: >> Andres Freund writes: >>> Tom, any chance you could check if your machine repros the issue before >>> these commits? >> Wilco, but it'll likely take a little while to get results ... > FWIW I also chewed through many

Re: WIP: WAL prefetch (another approach)

2021-04-28 Thread Thomas Munro
On Thu, Apr 29, 2021 at 3:14 PM Andres Freund wrote: > To me it looks like a smaller version of the problem is present in < 14, > albeit only when the page header is at a record boundary. In that case > we don't validate the page header immediately, only once it's completely > read. But we do

Re: WIP: WAL prefetch (another approach)

2021-04-28 Thread Tom Lane
Andres Freund writes: > I was now able to reproduce the problem again, and I'm afraid that the > bug I hit is likely separate from Tom's. Yeah, I think so --- the symptoms seem quite distinct. My score so far today on the G4 is: 12 error-free regression test cycles on b3ee4c503 (plus one more

Re: WIP: WAL prefetch (another approach)

2021-04-28 Thread Andres Freund
Hi, On 2021-04-28 17:59:22 -0700, Andres Freund wrote: > I can however say that pg_waldump on the standby's pg_wal does also > fail. The failure as part of the backend is "invalid memory alloc > request size", whereas in pg_waldump I get the much more helpful: > pg_waldump: fatal: error in WAL

Re: WIP: WAL prefetch (another approach)

2021-04-28 Thread Andres Freund
Hi, On 2021-04-28 17:59:22 -0700, Andres Freund wrote: > I can however say that pg_waldump on the standby's pg_wal does also > fail. The failure as part of the backend is "invalid memory alloc > request size", whereas in pg_waldump I get the much more helpful: > pg_waldump: fatal: error in WAL

Re: WIP: WAL prefetch (another approach)

2021-04-28 Thread Andres Freund
Hi, On 2021-04-28 20:24:43 -0400, Tom Lane wrote: > Andres Freund writes: > > Oh! I was about to ask how much shared buffers your primary / standby > > have. > Default configurations, so 128MB each. I thought that possibly initdb would detect less or something... I assume this is 32bit? I did

Re: WIP: WAL prefetch (another approach)

2021-04-28 Thread Tom Lane
Andres Freund writes: > On 2021-04-28 19:24:53 -0400, Tom Lane wrote: >> IOW, we've spent over twice as many CPU cycles shipping data to the >> standby as we did in applying the WAL on the standby. > I don't really know how the time calculation works on mac. Is there a > chance it includes time

Re: WIP: WAL prefetch (another approach)

2021-04-28 Thread Andres Freund
Hi, On 2021-04-28 19:24:53 -0400, Tom Lane wrote: > But I happened to notice the accumulated CPU time for the background > processes: > > USER PID %CPU %MEM VSZRSS TT STAT STARTED TIME COMMAND > tgl 19048 0.0 4.4 229952 92196 ?? Ss3:19PM 19:59.19 >

Re: WIP: WAL prefetch (another approach)

2021-04-28 Thread Tom Lane
Thomas Munro writes: > FWIW I also chewed through many megawatts trying to reproduce this on > a PowerPC system in 64 bit big endian mode, with an emulator. No > cigar. However, it's so slow that I didn't make it to 10 runs... Speaking of megawatts ... my G4 has now finished about ten cycles

Re: WIP: WAL prefetch (another approach)

2021-04-28 Thread Thomas Munro
On Thu, Apr 29, 2021 at 4:45 AM Tom Lane wrote: > Andres Freund writes: > > Tom, any chance you could check if your machine repros the issue before > > these commits? > > Wilco, but it'll likely take a little while to get results ... FWIW I also chewed through many megawatts trying to reproduce

Re: WIP: WAL prefetch (another approach)

2021-04-28 Thread Tom Lane
Andres Freund writes: > Tom, any chance you could check if your machine repros the issue before > these commits? Wilco, but it'll likely take a little while to get results ... regards, tom lane

Re: WIP: WAL prefetch (another approach)

2021-04-28 Thread Andres Freund
Hi, On 2021-04-22 13:59:58 +1200, Thomas Munro wrote: > On Thu, Apr 22, 2021 at 1:21 PM Tom Lane wrote: > > I've also tried to reproduce on 32-bit and 64-bit Intel, without > > success. So if this is real, maybe it's related to being big-endian > > hardware? But it's also quite sensitive to

Re: WIP: WAL prefetch (another approach)

2021-04-21 Thread Tom Lane
Andres Freund writes: > On 2021-04-21 21:21:05 -0400, Tom Lane wrote: >> What I'm doing is running the core regression tests with a single >> standby (on the same machine) and wal_consistency_checking = all. > Do you run them over replication, or sequentially by storing data into > an archive?

  1   2   3   >