Re: Refactoring the checkpointer's fsync request queue

2019-04-29 Thread Alvaro Herrera
On 2019-Apr-05, Thomas Munro wrote: > Ok, here is a patch that adds a one-typedef header and uses > SegmentIndex to replace all cases of BlockNumber and int holding a > segment number (where as an "index" or a "count"). Hmm, I now see (while doing the pg_checksum translation) that this patch didn

Re: Refactoring the checkpointer's fsync request queue

2019-04-04 Thread Thomas Munro
On Thu, Apr 4, 2019 at 11:47 PM Thomas Munro wrote: > Pushed. Thanks for all the work on this! I managed to break this today while testing with RELSEG_SIZE set to 1 block (= squillions of 8kb files). The problem is incorrect arguments to _mdfd_getseg(), in code added recently by me. Without th

Re: Refactoring the checkpointer's fsync request queue

2019-04-04 Thread Thomas Munro
On Fri, Apr 5, 2019 at 10:53 AM Thomas Munro wrote: > Ok, here is a patch that adds a one-typedef header and uses > SegmentIndex to replace all cases of BlockNumber and int holding a > segment number (where as an "index" or a "count"). (sorry, I meant "SegmentNumber", not "SegmentIndex") -- Tho

Re: Refactoring the checkpointer's fsync request queue

2019-04-04 Thread Thomas Munro
On Fri, Apr 5, 2019 at 2:03 AM Alvaro Herrera wrote: > On 2019-Apr-04, Thomas Munro wrote: > > I don't think it's project policy to put a single typedef into its own > > header like that, and I'm not sure where else to put it. > > shrug. Looks fine to me. I suppose if we don't have it anywhere,

Re: Refactoring the checkpointer's fsync request queue

2019-04-04 Thread Alvaro Herrera
On 2019-Apr-04, Thomas Munro wrote: > I don't think it's project policy to put a single typedef into its own > header like that, and I'm not sure where else to put it. shrug. Looks fine to me. I suppose if we don't have it anywhere, it's just because we haven't needed that particular trick yet.

Re: Refactoring the checkpointer's fsync request queue

2019-04-04 Thread Thomas Munro
On Thu, Apr 4, 2019 at 6:41 PM Shawn Debnath wrote: > On Thu, Apr 04, 2019 at 05:39:14PM +1300, Thomas Munro wrote: > > On Thu, Apr 4, 2019 at 5:36 PM Andres Freund wrote: > > > On 2019-04-03 21:19:45 -0700, Shawn Debnath wrote: > > > > +typedef struct FileTag > > > > +{ > > > > + int16

Re: Refactoring the checkpointer's fsync request queue

2019-04-03 Thread Thomas Munro
On Thu, Apr 4, 2019 at 5:36 PM Andres Freund wrote: > On 2019-04-03 21:19:45 -0700, Shawn Debnath wrote: > > +typedef struct FileTag > > +{ > > + int16 handler;/* SyncRequstHandler value, > > saving space */ > > + int16 forknum;/* ForkNu

Re: Refactoring the checkpointer's fsync request queue

2019-04-03 Thread Andres Freund
On 2019-04-03 21:19:45 -0700, Shawn Debnath wrote: > On Thu, Apr 04, 2019 at 02:01:14PM +1300, Thomas Munro wrote: > > On Thu, Apr 4, 2019 at 11:39 AM Thomas Munro wrote: > > > ... Perhaps > > > that is an argument for putting the sync handler number *inside* the > > > FileTag, since we currently

Re: Refactoring the checkpointer's fsync request queue

2019-04-03 Thread Thomas Munro
On Thu, Apr 4, 2019 at 11:39 AM Thomas Munro wrote: > ... Perhaps > that is an argument for putting the sync handler number *inside* the > FileTag, since we currently intend to do that with smgr IDs in > BufferTag (stealing space from ForkNumber). Here is a version like that. I like it better th

Re: Refactoring the checkpointer's fsync request queue

2019-04-03 Thread Thomas Munro
On Thu, Apr 4, 2019 at 10:44 AM Shawn Debnath wrote: > On Thu, Apr 04, 2019 at 01:08:31AM +1300, Thomas Munro wrote: > > As a sanity check on the programming interface this thing gives you, I > > tried teaching the SLRUs to use the fsync queue. I finished up making > > a few small improvements, b

Re: Refactoring the checkpointer's fsync request queue

2019-04-03 Thread Thomas Munro
On Tue, Apr 2, 2019 at 11:09 PM Thomas Munro wrote: > I'm going to do some more testing and tidying tomorrow (for example I > think the segment.h header is silly and I'd like to remove that), and > commit this. As a sanity check on the programming interface this thing gives you, I tried teaching

Re: Refactoring the checkpointer's fsync request queue

2019-04-02 Thread Thomas Munro
On Wed, Mar 27, 2019 at 5:48 AM Shawn Debnath wrote: > On Tue, Mar 26, 2019 at 09:22:56AM -0400, Robert Haas wrote: > > On Tue, Mar 26, 2019 at 12:20 AM Thomas Munro > > wrote: > > > Here's a fixup patch. 0001 is the same as Shawn's v12 patch, and 0002 > > > has my proposed changes to switch to

Re: Refactoring the checkpointer's fsync request queue

2019-03-26 Thread Robert Haas
On Tue, Mar 26, 2019 at 12:20 AM Thomas Munro wrote: > I've been trying to decide if that is a problem. Maybe there is a > performance angle, and I'm also wondering if it might increase the > risk of missing a write-back error. Of course we'll find a proper > solution to that problem (perhaps fd

Re: Refactoring the checkpointer's fsync request queue

2019-03-25 Thread Thomas Munro
On Wed, Mar 13, 2019 at 2:27 PM Thomas Munro wrote: > [...] Aside from some refactoring which I > think looks good anyway and prepares for future patches, the main > effect of this patch is to force the checkpointer to open and close > the files every time which seems OK to me. I've been trying t

Re: Refactoring the checkpointer's fsync request queue

2019-03-12 Thread Thomas Munro
On Wed, Mar 13, 2019 at 2:00 PM Shawn Debnath wrote: > So ... wondering if there are any other left over items for this patch > or is it good to go? I imagine there's at least a couple of us who would > love to see this get in for PG12. I rebased my WIP undo stuff[1] (targeting 13) on top of this

Re: Refactoring the checkpointer's fsync request queue

2019-03-06 Thread Thomas Munro
On Wed, Mar 6, 2019 at 6:16 AM Shawn Debnath wrote: > On Tue, Mar 05, 2019 at 11:53:16AM -0500, Tom Lane wrote: > > Thomas Munro writes: > > > Why do we need to include fmgr.h in md.h? > > > > More generally, any massive increase in an include file's inclusions > > is probably a sign that you nee

Re: Refactoring the checkpointer's fsync request queue

2019-03-05 Thread Tom Lane
Thomas Munro writes: > +#include "fmgr.h" > +#include "storage/block.h" > +#include "storage/relfilenode.h" > +#include "storage/smgr.h" > +#include "storage/sync.h" > Why do we need to include fmgr.h in md.h? More generally, any massive increase in an include file's inclusions is probably a sig

Re: Refactoring the checkpointer's fsync request queue

2019-03-05 Thread Thomas Munro
On Wed, Mar 6, 2019 at 5:07 AM Shawn Debnath wrote: > Confirmed. Patch shows 8900 ms vs 192 ms on master for the insert test. > Interesting! It's reproducible so should be able to figure out what's > going on. The only thing we do in ForwardSyncRequest() is split up the 8 > bits into 2x4 bits and

Re: Refactoring the checkpointer's fsync request queue

2019-03-05 Thread Thomas Munro
On Tue, Mar 5, 2019 at 2:25 PM Shawn Debnath wrote: > [v11 patch] Thanks. Hmm, something is wrong here because make check is dramatically slower -- for example the "insert" test runs in ~8-13 seconds instead of the usual ~0.2 seconds according to Travis, AppVeyor and my local FreeBSD system (not

Re: Refactoring the checkpointer's fsync request queue

2019-03-01 Thread Robert Haas
On Fri, Mar 1, 2019 at 3:35 PM Shawn Debnath wrote: > On Fri, Mar 01, 2019 at 03:03:19PM -0500, Robert Haas wrote: > > On Fri, Mar 1, 2019 at 2:36 PM Shawn Debnath wrote: > > > I disagree, at least with combining and retaining enums. Encoding all > > > the possible request types with the current,

Re: Refactoring the checkpointer's fsync request queue

2019-03-01 Thread Thomas Munro
On Sat, Mar 2, 2019 at 9:35 AM Shawn Debnath wrote: > On Fri, Mar 01, 2019 at 03:03:19PM -0500, Robert Haas wrote: > > On Fri, Mar 1, 2019 at 2:36 PM Shawn Debnath wrote: > > > I disagree, at least with combining and retaining enums. Encoding all > > > the possible request types with the current,

Re: Refactoring the checkpointer's fsync request queue

2019-03-01 Thread Robert Haas
On Fri, Mar 1, 2019 at 2:36 PM Shawn Debnath wrote: > I disagree, at least with combining and retaining enums. Encoding all > the possible request types with the current, planned and future SMGRs > would cause a sheer explosion in the number of enum values. How big of an explosion would it be?

Re: Refactoring the checkpointer's fsync request queue

2019-03-01 Thread Thomas Munro
On Sat, Mar 2, 2019 at 8:36 AM Shawn Debnath wrote: > On Fri, Mar 01, 2019 at 01:15:21PM -0500, Robert Haas wrote: > > > > > > > > +typedef enum syncrequestowner > > > > +{ > > > > + SYNC_MD = 0 /* md smgr */ > > > > +} syncrequestowner; > > > > > > > > I have a feeling that Andr

Re: Refactoring the checkpointer's fsync request queue

2019-03-01 Thread Robert Haas
On Fri, Mar 1, 2019 at 12:43 PM Andres Freund wrote: > Obviously it's nicer looking this way, but OTOH, that means we have to > send more data over the queue, because we can't easily combine the > request + "owner". I don't have too strong feelings about it though. Yeah, I would lean toward combi

Re: Refactoring the checkpointer's fsync request queue

2019-03-01 Thread Andres Freund
Hi, On 2019-03-01 23:17:27 +1300, Thomas Munro wrote: > @@ -8616,7 +8617,7 @@ CreateCheckPoint(int flags) > * the REDO pointer. Note that smgr must not do anything that'd have > to > * be undone if we decide no checkpoint is needed. > */ > - smgrpreckpt(); > +

Re: Refactoring the checkpointer's fsync request queue

2019-03-01 Thread Thomas Munro
On Thu, Feb 28, 2019 at 10:43 AM Thomas Munro wrote: > On Thu, Feb 28, 2019 at 10:27 AM Shawn Debnath wrote: > > We had a quick offline discussion to get on the same page and we agreed > > to move forward with Andres' approach above. Attached is patch v10. > > Here's the overview of the patch: >

Re: Refactoring the checkpointer's fsync request queue

2019-02-27 Thread Thomas Munro
On Thu, Feb 28, 2019 at 10:27 AM Shawn Debnath wrote: > We had a quick offline discussion to get on the same page and we agreed > to move forward with Andres' approach above. Attached is patch v10. > Here's the overview of the patch: Thanks. I will review, and try to rebase my undo patches on to

Re: Refactoring the checkpointer's fsync request queue

2019-02-22 Thread Andres Freund
On 2019-02-22 15:45:50 -0800, Shawn Debnath wrote: > > > Well even if you do it with individual segment cancel messages for > > > relations, you still need a way to deal with whole-database drops > > > (generating the cancels for every segment in every relation in the > > > database would be nuts),

Re: Refactoring the checkpointer's fsync request queue

2019-02-22 Thread Andres Freund
Hi, On 2019-02-23 11:59:04 +1300, Thomas Munro wrote: > On Sat, Feb 23, 2019 at 11:48 AM Andres Freund wrote: > > > Yeah I suggested dynamic registration to avoid the problem that eg > > > src/backend/storage/sync.c otherwise needs to forward declare > > > md_tagtopath(), undofile_tagtopath(), sl

Re: Refactoring the checkpointer's fsync request queue

2019-02-22 Thread Thomas Munro
On Sat, Feb 23, 2019 at 11:48 AM Andres Freund wrote: > > Yeah I suggested dynamic registration to avoid the problem that eg > > src/backend/storage/sync.c otherwise needs to forward declare > > md_tagtopath(), undofile_tagtopath(), slru_tagtopath(), ..., or maybe > > #include etc, which seemed l

Re: Refactoring the checkpointer's fsync request queue

2019-02-22 Thread Andres Freund
Hi, On 2019-02-23 11:42:49 +1300, Thomas Munro wrote: > On Sat, Feb 23, 2019 at 11:15 AM Andres Freund wrote: > > On 2019-02-22 10:18:57 -0800, Shawn Debnath wrote: > > > I think using callbacks is the better path forward than having md or > > > other components issue an invalidate request for ea

Re: Refactoring the checkpointer's fsync request queue

2019-02-22 Thread Thomas Munro
On Sat, Feb 23, 2019 at 11:15 AM Andres Freund wrote: > On 2019-02-22 10:18:57 -0800, Shawn Debnath wrote: > > I think using callbacks is the better path forward than having md or > > other components issue an invalidate request for each and every segment > > which can get quite heavy handed for l

Re: Refactoring the checkpointer's fsync request queue

2019-02-22 Thread Andres Freund
Hi, On 2019-02-22 10:18:57 -0800, Shawn Debnath wrote: > I think using callbacks is the better path forward than having md or > other components issue an invalidate request for each and every segment > which can get quite heavy handed for large databases. I'm not sure I buy this. Unlinking file

Re: Refactoring the checkpointer's fsync request queue

2019-02-20 Thread Thomas Munro
On Thu, Feb 21, 2019 at 12:50 PM Andres Freund wrote: > Thanks for the update! +1, thanks Shawn. It's interesting that you measure performance improvements that IIUC can come only from dropping the bitmapset stuff (or I guess bugs). I don't understand the mechanism for that improvement yet. Th

Re: Refactoring the checkpointer's fsync request queue

2019-02-20 Thread Andres Freund
Hi, Thanks for the update! On 2019-02-20 15:27:40 -0800, Shawn Debnath wrote: > As promised, here's a patch that addresses the points discussed by > Andres and Thomas at FOSDEM. As a result of how we want checkpointer to > track what files to fsync, the pending ops table now integrates the > f

Re: Refactoring the checkpointer's fsync request queue

2019-02-15 Thread Andres Freund
it might be better for the fsync request queue to work > in terms of pathnames. Instead of the approach in this patch, where a > backend sends an fsync request for { reflfilenode, segno } inside > mdwrite(), and then the checkpointer processes the request by calling > smgrdimmedsyncr

Re: Refactoring the checkpointer's fsync request queue

2019-02-12 Thread Thomas Munro
that trip I had a chance to discuss this patch with Andres Freund in person, and he opined that it might be better for the fsync request queue to work in terms of pathnames. Instead of the approach in this patch, where a backend sends an fsync request for { reflfilenode, segno } inside mdwrite(), and

Re: Refactoring the checkpointer's fsync request queue

2019-02-03 Thread Michael Paquier
On Wed, Jan 30, 2019 at 09:59:38PM -0800, Shawn Debnath wrote: > I (finally) got a chance to go through these patches and they look > great. Thank you for working on this! This review is very recent, so I have moved the patch to next CF. -- Michael signature.asc Description: PGP signature

Re: Refactoring the checkpointer's fsync request queue

2019-01-22 Thread Andres Freund
Hi, On 2019-01-22 14:53:11 -0600, Kevin Grittner wrote: > On Tue, Jan 22, 2019 at 2:38 PM Andres Freund wrote: > > > close() doesn't trigger an fsync() in general > > What sort of a performance hit was observed when testing the addition > of fsync or fdatasync before any PostgreSQL close() of a

Re: Refactoring the checkpointer's fsync request queue

2019-01-22 Thread Kevin Grittner
On Tue, Jan 22, 2019 at 2:38 PM Andres Freund wrote: > close() doesn't trigger an fsync() in general What sort of a performance hit was observed when testing the addition of fsync or fdatasync before any PostgreSQL close() of a writable file, or have we not yet checked that? > https://postgr.es

Re: Refactoring the checkpointer's fsync request queue

2019-01-22 Thread Andres Freund
Hi, On 2019-01-22 14:29:23 -0600, Kevin Grittner wrote: > On Tue, Jan 22, 2019 at 12:17 PM Andres Freund wrote: > > > Unfortunately, unless something has changed recently, that patch is > > *not* sufficient to really solve the issue - we don't guarantee that > > there's always an fd preventing t

Re: Refactoring the checkpointer's fsync request queue

2019-01-22 Thread Thomas Munro
On Wed, Jan 23, 2019 at 9:29 AM Kevin Grittner wrote: > Can you point to a post explaining how the inode can be evicted? Hi Kevin, To recap the (admittedly confusing) list of problems with Linux fsync or our usage: 1. On Linux < 4.13, the error state can be cleared in various surprising ways s

Re: Refactoring the checkpointer's fsync request queue

2019-01-22 Thread Kevin Grittner
On Tue, Jan 22, 2019 at 12:17 PM Andres Freund wrote: > Unfortunately, unless something has changed recently, that patch is > *not* sufficient to really solve the issue - we don't guarantee that > there's always an fd preventing the necessary information from being > evicted from memory: But we

Re: Refactoring the checkpointer's fsync request queue

2019-01-22 Thread Andres Freund
Hi,, On 2019-01-22 08:27:48 -0600, Kevin Grittner wrote: > With the help of VMware's Dirk Hohndel (VMware's Chief Open Source > Officer, a VP position near the top of the organization, and a > personal friend of Linus), I have been fortunate enough to make > contact directly with Linus Torvalds to

Re: Refactoring the checkpointer's fsync request queue

2019-01-22 Thread Kevin Grittner
On Tue, Jan 22, 2019 at 8:27 AM Kevin Grittner wrote: > With the help of VMware's Dirk Hohndel (VMware's Chief Open Source > Officer, a VP position near the top of the organization, and a > personal friend of Linus), I have been fortunate enough to make > contact directly with Linus Torvalds to d

Re: Refactoring the checkpointer's fsync request queue

2019-01-22 Thread Kevin Grittner
With the help of VMware's Dirk Hohndel (VMware's Chief Open Source Officer, a VP position near the top of the organization, and a personal friend of Linus), I have been fortunate enough to make contact directly with Linus Torvalds to discuss this issue. In emails to me he has told me that this pat

Re: Refactoring the checkpointer's fsync request queue

2019-01-02 Thread Thomas Munro
On Wed, Jan 2, 2019 at 11:40 AM Thomas Munro wrote: > For the 0001 patch, I'll probably want to reconsider the naming a bit > ("simple -> "specialized", "generic", ...?), refine (ability to turn > off the small vector optimisation? optional MemoryContext? ability > to extend without copying or z

Re: Refactoring the checkpointer's fsync request queue

2019-01-01 Thread Thomas Munro
On Tue, Jan 1, 2019 at 10:41 AM Thomas Munro wrote: > So, I have split this work into multiple patches. 0001 is a draft > version of some new infrastructure I'd like to propose, 0002 is the > thing originally described by the first two paragraphs in the first > email in this thread, and the rest

Re: Refactoring the checkpointer's fsync request queue

2018-12-31 Thread Thomas Munro
On Sun, Dec 2, 2018 at 1:46 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > On Mon, Nov 26, 2018 at 11:47 PM Thomas Munro > > wrote: > > > On Fri, Nov 23, 2018 at 5:45 PM Thomas Munro > > > wrote: > > > I do have a new plan though... > > > > Ugh. The plan in my previous email doesn't work,

Re: Refactoring the checkpointer's fsync request queue

2018-12-01 Thread Dmitry Dolgov
> On Mon, Nov 26, 2018 at 11:47 PM Thomas Munro > wrote: > > On Fri, Nov 23, 2018 at 5:45 PM Thomas Munro > wrote: > > I do have a new plan though... > > Ugh. The plan in my previous email doesn't work, I was confused about > the timing of the buffer header update. Back to the drawing board.

Re: Refactoring the checkpointer's fsync request queue

2018-11-26 Thread Thomas Munro
On Fri, Nov 23, 2018 at 5:45 PM Thomas Munro wrote: > I do have a new plan though... Ugh. The plan in my previous email doesn't work, I was confused about the timing of the buffer header update. Back to the drawing board. -- Thomas Munro http://www.enterprisedb.com

Re: Refactoring the checkpointer's fsync request queue

2018-11-20 Thread Robert Haas
On Fri, Nov 16, 2018 at 5:38 PM Thomas Munro wrote: > Or to put it another way, you can't be given a lower sequence number > than another process that has already written, because that other > process must have been given a sequence number before it wrote. OK, that makes sense. -- Robert Haas E

Re: Refactoring the checkpointer's fsync request queue

2018-11-16 Thread Thomas Munro
On Sat, Nov 17, 2018 at 4:05 AM Robert Haas wrote: > On Wed, Nov 14, 2018 at 4:49 PM Andres Freund wrote: > > On 2018-11-14 16:36:49 -0500, Robert Haas wrote: > > > But how do you make reading that counter atomic with the open() itself? > > > > I don't see why it has to be. As long as the "fd gen

Re: Refactoring the checkpointer's fsync request queue

2018-11-16 Thread Thomas Munro
On Thu, Nov 15, 2018 at 5:09 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > While testing this patch with frequent checkpoints I've stumbled upon an > interesting error, that happened already after I finished one test: > > TRAP: FailedAssertion("!(rc > 0)", File: "checkpointer.c", Line: 574) Th

Re: Refactoring the checkpointer's fsync request queue

2018-11-16 Thread Robert Haas
On Wed, Nov 14, 2018 at 4:49 PM Andres Freund wrote: > On 2018-11-14 16:36:49 -0500, Robert Haas wrote: > > But how do you make reading that counter atomic with the open() itself? > > I don't see why it has to be. As long as the "fd generation" assignment > happens before fsync (and writes seconda

Re: Refactoring the checkpointer's fsync request queue

2018-11-14 Thread Andres Freund
On 2018-11-14 16:36:49 -0500, Robert Haas wrote: > On Tue, Nov 13, 2018 at 6:44 PM Thomas Munro > wrote: > > I'm not sure if it matters whether we send the fd before or after the > > write, but we still need some kind of global ordering of fds that can > > order a given fd with respect to writes i

Re: Refactoring the checkpointer's fsync request queue

2018-11-14 Thread Robert Haas
On Tue, Nov 13, 2018 at 6:44 PM Thomas Munro wrote: > > That sounds a little like you are proposing to go back to the way > > things were before 806a2aee3791244bf0f916729bfdb5489936e068 (and, > > belatedly, bf405ba8e460051e715d0a91442b579e590328ce) although I guess > > the division of labor wouldn

Re: Refactoring the checkpointer's fsync request queue

2018-11-14 Thread Dmitry Dolgov
> On Wed, 14 Nov 2018 at 00:44, Thomas Munro > wrote: > > Here is a rebased version of the patch, post pread()/pwrite(). I have > also rewritten the commit message to try to explain the rationale > concisely, instead of requiring the reader to consult multiple > discussions that jump between len

Re: Refactoring the checkpointer's fsync request queue

2018-11-13 Thread Thomas Munro
(Replies to a couple of different messages below) On Wed, Nov 14, 2018 at 6:04 AM Robert Haas wrote: > On Sun, Nov 11, 2018 at 9:59 PM Thomas Munro > wrote: > > There is one major problem with this patch > > If there's only one, you're doing great! Although admittedly this > seems like a big on

Re: Refactoring the checkpointer's fsync request queue

2018-11-13 Thread Robert Haas
On Tue, Nov 13, 2018 at 1:07 PM Andres Freund wrote: > On 2018-11-13 12:04:23 -0500, Robert Haas wrote: > > I still feel like this whole pass-the-fds-to-the-checkpointer thing is > > a bit of a fool's errand, though. I mean, there's no guarantee that > > the first FD that gets passed to the check

Re: Refactoring the checkpointer's fsync request queue

2018-11-13 Thread Andres Freund
Hi, On 2018-11-13 12:04:23 -0500, Robert Haas wrote: > I still feel like this whole pass-the-fds-to-the-checkpointer thing is > a bit of a fool's errand, though. I mean, there's no guarantee that > the first FD that gets passed to the checkpointer is the first one > opened, or even the first one

Re: Refactoring the checkpointer's fsync request queue

2018-11-13 Thread Andres Freund
Hi, On 2018-11-12 15:58:41 +1300, Thomas Munro wrote: > There is one major problem with this patch: BufferSync(), run in the > checkpointer, can deadlock against a backend that holds a buffer lock > and is blocked in SendFsyncRequest(). To break this deadlock, we need > way out of it on either t

Re: Refactoring the checkpointer's fsync request queue

2018-11-13 Thread Robert Haas
On Sun, Nov 11, 2018 at 9:59 PM Thomas Munro wrote: > There is one major problem with this patch If there's only one, you're doing great! Although admittedly this seems like a big one... > 1. Go back to the current pressure-valve strategy: make the sending > side perform the fsync(), if it can

Re: Fsync request queue

2018-05-01 Thread Andres Freund
On 2018-05-01 13:43:14 -0400, Robert Haas wrote: > On Tue, May 1, 2018 at 1:41 PM, Andres Freund wrote: > > I unfortunately don't have access to the relevant reports anymore, so > > it's only by memory. What I do remember is that a few I saw > > pg_stat_bgwriter.buffers_backend_fsync values that w

Re: Fsync request queue

2018-05-01 Thread Robert Haas
On Tue, May 1, 2018 at 1:41 PM, Andres Freund wrote: > I unfortunately don't have access to the relevant reports anymore, so > it's only by memory. What I do remember is that a few I saw > pg_stat_bgwriter.buffers_backend_fsync values that we a pretty sizable > fraction of the buffers written by b

Re: Fsync request queue

2018-05-01 Thread Andres Freund
On 2018-05-01 13:21:21 -0400, Robert Haas wrote: > On Mon, Apr 30, 2018 at 7:08 PM, Andres Freund wrote: > >> True, but has anyone ever actually observed a non-zero > >> pg_stat_bgwriter.buffers_backend_fsync in the wild after the > >> compaction queue stuff was added/backpatched? > > > > Yes. >

Re: Fsync request queue

2018-05-01 Thread Robert Haas
On Mon, Apr 30, 2018 at 7:08 PM, Andres Freund wrote: >> True, but has anyone ever actually observed a non-zero >> pg_stat_bgwriter.buffers_backend_fsync in the wild after the >> compaction queue stuff was added/backpatched? > > Yes. Care to elaborate? -- Robert Haas EnterpriseDB: http://www.en

Re: Fsync request queue

2018-04-30 Thread Andres Freund
On 2018-04-30 16:07:48 -0700, Peter Geoghegan wrote: > On Mon, Apr 30, 2018 at 4:03 PM, Andres Freund wrote: > >> Is this a problem in practice, though? I don't remember seeing any reports > >> of the fsync queue filling up, after we got the code to compact it. I don't > >> know if anyone has been

Re: Fsync request queue

2018-04-30 Thread Peter Geoghegan
On Mon, Apr 30, 2018 at 4:03 PM, Andres Freund wrote: >> Is this a problem in practice, though? I don't remember seeing any reports >> of the fsync queue filling up, after we got the code to compact it. I don't >> know if anyone has been looking for that, so that might also explain the >> absence

Re: Fsync request queue

2018-04-30 Thread Andres Freund
On 2018-04-25 09:19:52 +0300, Heikki Linnakangas wrote: > On 24/04/18 21:00, Andres Freund wrote: > > Right now if the queue is full and can't be compacted we end up > > fsync()ing on every single write, rather than once per checkpoint > > afaict. That's a fairly horrible. > > > > For the case tha

Re: Fsync request queue

2018-04-24 Thread Heikki Linnakangas
? I don't remember seeing any reports of the fsync queue filling up, after we got the code to compact it. I don't know if anyone has been looking for that, so that might also explain the absence of reports, though. The fsync request queue often is fairly large. 20 bytes for eac

Fsync request queue

2018-04-24 Thread Andres Freund
Hi, While thinking about the at the fsync mess, I started looking at the fsync request queue. I was primarily wondering whether we can keep FDs open long enough (by forwarding them to the checkpointer) to guarantee that we see the error. But that's mostly irrelevant for what I'm wonde