Re: Handing off SLRU fsyncs to the checkpointer

2021-01-04 Thread Thomas Munro
On Mon, Jan 4, 2021 at 3:35 AM Tomas Vondra wrote: > Seems this commit left behind a couple unnecessary prototypes in a bunch > of header files. In particular, it removed these functions > > - ShutdownCLOG(); > - ShutdownCommitTs(); > - ShutdownSUBTRANS(); > - ShutdownMultiXact(); Thanks.

Re: Handing off SLRU fsyncs to the checkpointer

2021-01-03 Thread Tomas Vondra
On 9/25/20 9:09 AM, Thomas Munro wrote: On Fri, Sep 25, 2020 at 12:53 PM Thomas Munro wrote: Here's a new version. The final thing I'm contemplating before pushing this is whether there may be hidden magical dependencies in the order of operations in CheckPointGuts(), which I've changed

Re: Handing off SLRU fsyncs to the checkpointer

2020-09-25 Thread Thomas Munro
On Fri, Sep 25, 2020 at 12:53 PM Thomas Munro wrote: > Here's a new version. The final thing I'm contemplating before > pushing this is whether there may be hidden magical dependencies in > the order of operations in CheckPointGuts(), which I've changed > around. Andres, any comments? I nagged

Re: Handing off SLRU fsyncs to the checkpointer

2020-09-24 Thread Thomas Munro
On Fri, Sep 25, 2020 at 12:05 PM Tom Lane wrote: > Thomas Munro writes: > > Tom, do you have any thoughts on ShutdownCLOG() etc? > > Hm, if we cannot reach that without first completing a shutdown checkpoint, > it does seem a little pointless. Thanks for the sanity check. > It'd likely be a

Re: Handing off SLRU fsyncs to the checkpointer

2020-09-24 Thread Tom Lane
Thomas Munro writes: > Tom, do you have any thoughts on ShutdownCLOG() etc? Hm, if we cannot reach that without first completing a shutdown checkpoint, it does seem a little pointless. It'd likely be a good idea to add a comment to CheckPointCLOG et al explaining that we expect $what-exactly to

Re: Handing off SLRU fsyncs to the checkpointer

2020-09-24 Thread Thomas Munro
On Wed, Sep 23, 2020 at 1:56 PM Thomas Munro wrote: > As for the ShutdownXXX() functions, I haven't yet come up with any > reason for this code to exist. Emboldened by a colleague's inability > to explain to me what that code is doing for us, here is a new version > that just rips it all out.

Re: Handing off SLRU fsyncs to the checkpointer

2020-09-22 Thread Thomas Munro
On Tue, Sep 22, 2020 at 9:08 AM Thomas Munro wrote: > On Mon, Sep 21, 2020 at 2:19 PM Thomas Munro wrote: > > While scanning for comments and identifier names that needed updating, > > I realised that this patch changed the behaviour of the ShutdownXXX() > > functions, since they currently flush

Re: Handing off SLRU fsyncs to the checkpointer

2020-09-21 Thread Thomas Munro
On Mon, Sep 21, 2020 at 2:19 PM Thomas Munro wrote: > While scanning for comments and identifier names that needed updating, > I realised that this patch changed the behaviour of the ShutdownXXX() > functions, since they currently flush the SLRUs but are not followed > by a checkpoint. I'm not

Re: Handing off SLRU fsyncs to the checkpointer

2020-09-20 Thread Thomas Munro
On Sun, Sep 20, 2020 at 12:40 PM Thomas Munro wrote: > On Sat, Sep 19, 2020 at 5:06 PM Thomas Munro wrote: > > In the meantime, from the low-hanging-fruit department, here's a new > > version of the SLRU-fsync-offload patch. The only changes are a > > tweaked commit message, and adoption of C99

Re: Handing off SLRU fsyncs to the checkpointer

2020-09-19 Thread Thomas Munro
On Sat, Sep 19, 2020 at 5:06 PM Thomas Munro wrote: > In the meantime, from the low-hanging-fruit department, here's a new > version of the SLRU-fsync-offload patch. The only changes are a > tweaked commit message, and adoption of C99 designated initialisers > for the function table, so {

Re: Handing off SLRU fsyncs to the checkpointer

2020-09-18 Thread Thomas Munro
On Mon, Aug 31, 2020 at 8:50 PM Jakub Wartak wrote: > - IO_URING - gives a lot of promise here I think, is it even planned to be > shown for PgSQL14 cycle ? Or it's more like PgSQL15? I can't answer that, but I've played around with the prototype quite a bit, and thought quite a lot about how

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-31 Thread Jakub Wartak
Hi Thomas, hackers, >> ... %CPU ... COMMAND >> ... 97.4 ... postgres: startup recovering 00010089 > So, what else is pushing this thing off CPU, anyway? For one thing, I > guess it might be stalling while reading the WAL itself, because (1) > we only read it 8KB at a time,

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-28 Thread Thomas Munro
On Sat, Aug 29, 2020 at 12:43 AM Jakub Wartak wrote: > ... %CPU ... COMMAND > ... 97.4 ... postgres: startup recovering 00010089 So, what else is pushing this thing off CPU, anyway? For one thing, I guess it might be stalling while reading the WAL itself, because (1) we only

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-28 Thread Thomas Munro
On Sat, Aug 29, 2020 at 12:43 AM Jakub Wartak wrote: > USERPID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND > postgres 120935 0.9 0.0 866052 3824 ?Ss 09:47 0:00 postgres: > checkpointer > postgres 120936 61.9 0.0 865796 3824 ?Rs 09:47 0:22

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-28 Thread Jakub Wartak
Hi Thomas, hackers, >> > To move these writes out of recovery's way, we should probably just >> > run the bgwriter process during crash recovery. I'm going to look >> > into that. >> >> Sounds awesome. > >I wrote a quick and dirty experimental patch to try that. I can't see >any benefit from it

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-27 Thread Thomas Munro
On Thu, Aug 27, 2020 at 8:48 PM Jakub Wartak wrote: > >> 29.62% postgres [kernel.kallsyms] [k] > >> copy_user_enhanced_fast_string > >> ---copy_user_enhanced_fast_string > >>|--17.98%--copyin > >> [..] > >>| __pwrite_nocancel > >>

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-27 Thread Thomas Munro
On Thu, Aug 27, 2020 at 8:48 PM Jakub Wartak wrote: > I've tried to get cache misses ratio via PMCs, apparently on EC2 they are > (even on bigger) reporting as not-supported or zeros. I heard some of the counters are only allowed on their dedicated instance types. > However interestingly the

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-27 Thread Jakub Wartak
Hi Alvaro, Thomas, hackers >> 14.69% postgres postgres[.] hash_search_with_hash_value >> ---hash_search_with_hash_value >>|--9.80%--BufTableLookup [..] >> --4.90%--smgropen >>

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-27 Thread Jakub Wartak
Hi Thomas / hackers, >> The append-only bottleneck appears to be limited by syscalls/s due to small >> block size even with everything in FS cache (but not in shared buffers, >> please compare with TEST1 as there was no such bottleneck at all): >> >> 29.62% postgres [kernel.kallsyms]

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-26 Thread Thomas Munro
On Thu, Aug 27, 2020 at 6:15 AM Alvaro Herrera wrote: > > --4.90%--smgropen > > |--2.86%--ReadBufferWithoutRelcache > > Looking at an earlier report of this problem I was thinking whether it'd > make sense to replace SMgrRelationHash with a simplehash

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-26 Thread Alvaro Herrera
On 2020-Aug-25, Jakub Wartak wrote: > Turning on/off the defer SLRU patch and/or fsync doesn't seem to make > any difference, so if anyone is curious the next sets of append-only > bottlenecks is like below: > > 14.69% postgres postgres[.] hash_search_with_hash_value >

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-26 Thread Alvaro Herrera
On 2020-Aug-25, Andres Freund wrote: > Hi, > > On 2020-08-26 15:58:14 +1200, Thomas Munro wrote: > > > --12.51%--compactify_tuples > > > PageRepairFragmentation > > > heap2_redo > > > StartupXLOG > > >

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-25 Thread Andres Freund
Hi, On 2020-08-26 15:58:14 +1200, Thomas Munro wrote: > > --12.51%--compactify_tuples > > PageRepairFragmentation > > heap2_redo > > StartupXLOG > > I wonder if there is something higher level that

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-25 Thread Thomas Munro
On Tue, Aug 25, 2020 at 9:16 PM Jakub Wartak wrote: > I just wanted to help testing this patch (defer SLRU fsyncs during recovery) > and also faster compactify_tuples() patch [2] as both are related to the WAL > recovery performance in which I'm interested in. This is my first message to >

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-25 Thread Jakub Wartak
On Wed, Aug 12, 2020 at 6:06 PM Thomas Munro wrote: > [patch] Hi Thomas / hackers, I just wanted to help testing this patch (defer SLRU fsyncs during recovery) and also faster compactify_tuples() patch [2] as both are related to the WAL recovery performance in which I'm interested in. This is

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-12 Thread Thomas Munro
On Wed, Aug 12, 2020 at 6:06 PM Thomas Munro wrote: > [patch] Bitrot, rebased, no changes. > Yeah, the combined effect of these two patches is better than I > expected. To be clear though, I was only measuring the time between > the "redo starts at ..." and "redo done at ..." messages, since

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-12 Thread Thomas Munro
On Sat, Aug 8, 2020 at 2:44 AM Robert Haas wrote: > On Wed, Aug 5, 2020 at 2:01 AM Thomas Munro wrote: > > * Master is around 11% faster than last week before commit c5315f4f > > "Cache smgrnblocks() results in recovery." > > * This patch gives a similar speedup, bringing the total to around 25%

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-07 Thread Robert Haas
On Wed, Aug 5, 2020 at 2:01 AM Thomas Munro wrote: > * Master is around 11% faster than last week before commit c5315f4f > "Cache smgrnblocks() results in recovery." > * This patch gives a similar speedup, bringing the total to around 25% > faster than last week (the time is ~20% less, the WAL

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-05 Thread Thomas Munro
On Tue, Aug 4, 2020 at 6:02 PM Thomas Munro wrote: > ... speedup of around 6% ... I did some better testing. OS: Linux, storage: consumer SSD. I repeatedly ran crash recovery on 3.3GB worth of WAL generated with 8M pgbench transactions. I tested 3 different builds 7 times each and used

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-04 Thread Thomas Munro
On Wed, Feb 12, 2020 at 9:54 PM Thomas Munro wrote: > In commit 3eb77eba we made it possible for any subsystem that wants a > file to be flushed as part of the next checkpoint to ask the > checkpointer to do that, as previously only md.c could do. Hello, While working on recovery performance, I

Handing off SLRU fsyncs to the checkpointer

2020-02-12 Thread Thomas Munro
Hi, In commit 3eb77eba we made it possible for any subsystem that wants a file to be flushed as part of the next checkpoint to ask the checkpointer to do that, as previously only md.c could do. In the past, foreground CLOG flush stalls were a problem, but then commit 33aaa139 cranked up the