Re: [HACKERS] Adding hook in BufferSync for backup purposes
Hi hackers! > 8 авг. 2017 г., в 11:27, Tom Laneнаписал(а): > > My point is not to claim that we mustn't put a hook there. It's that what > such a hook could safely do is tightly constrained, and you've not offered > clear evidence that there's something useful to be done within those > constraints. Alvaro seems to think that the result might be better > reached by hooking in at other places, and my gut reaction is similar. > I was studying through work done by Marco and Gabriel on the matter of tracking pages for the incremental backup, and I have found PTRACK patch by Yury Zhuravlev and PostgresPro [0]. This work seems to be solid and thoughtful. I have drafted a new prototype for hooks enabling incremental backup as extension based on PTRACK calls. If you look at the original patch you can see that attached patch differs slightly: functionality to track end of critical section is omitted. I have not included it in this draft because it changes very sensitive place even for those who do not need it. Subscriber of proposed hook must remember that it is invoked under critical section. But there cannot me more than XLR_MAX_BLOCK_ID blocks for one transaction. Proposed draft does not add any functionality to track finished transactions or any atomic unit of work, just provides a flow of changed block numbers. Neither does this draft provide any assumption on where to store this information. I suppose subscribers could trigger asynchronous writes somewhere as long as info for given segment is accumulated (do we need a hook on segment end then?). During inremental backup you can skip scanning those WAL segments for which you have accumulated changeset of block numbers. The thing which is not clear to me: if we are accumulating blocknumbers under critical section, then we must place them to preallocated array. When is the best time to take away these blocknumbers to empty that array and avoid overflow? PTRACK has array of XLR_MAX_BLOCK_ID length and saves these array during the end of each critical section. But I want to avoid intervention into critical sections. Thank you for your attention, any thoughts will be appreciated. Best regards, Andrey Borodin. [0] https://gist.github.com/stalkerg/ab833d94e2f64df241f1835651e06e4b 0001-hooks-to-watch-for-changed-pages.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding hook in BufferSync for backup purposes
Andrey Borodinwrites: > 7 авг. 2017 г., в 18:37, Tom Lane написал(а): >> Yeah. Keep in mind that if the extension does anything at all that could >> possibly throw an error, and if that error condition persists across >> multiple tries, you will have broken the database completely: it will >> be impossible to complete a checkpoint, and your WAL segment pool will >> grow until it exhausts disk. So the idea of doing something that involves >> unspecified extension behavior, especially possible interaction with >> an external backup agent, right there is pretty terrifying. > I think that API for extensions should tend to protect developer from > breaking everything, but may allow it with precaution warnings in docs > and comments. Please let me know if this assumption is incorrect. My point is not to claim that we mustn't put a hook there. It's that what such a hook could safely do is tightly constrained, and you've not offered clear evidence that there's something useful to be done within those constraints. Alvaro seems to think that the result might be better reached by hooking in at other places, and my gut reaction is similar. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding hook in BufferSync for backup purposes
Alvaro, Tom, thank you for your valuable comments. > Alvaro: > I remember discussing the topic of differential base-backups with > somebody (probably Marco and Gabriele). The idea we had was to have a > new relation fork which stores an LSN for each group of pages, > indicating the LSN of the newest change to those pages. The backup tool > "scans" the whole LSN fork, and grabs images of all pages that have LSNs > newer than the one used for the previous base backup. Thanks for the pointer, I’ve found the discussions and now I’m in a process of extraction of the knowledge from there > I think it should be at the point where the buffer is > modified (i.e. when WAL is written) rather than when it's checkpointed > out. WAL is flushed before actual pages are written to disk(sent to kernel). I’d like to notify extensions right after we exactly sure pages were flushed. But you are right, BufferSync is not good place for this: 1. It lacks LSNs 2. It’s not the only place to flush: bgwriter goes through nearby function FlushBuffer() and many AMs write directly to smgr (for example when matapge is created) BufferSync() seemed sooo comfortable and efficient place for flashing info on dirty pages, already sorted and grouped by tablespace, but it is absolutely incorrect to do it there. I’ll look for the better place. > > 7 авг. 2017 г., в 18:37, Tom Laneнаписал(а): > > Yeah. Keep in mind that if the extension does anything at all that could > possibly throw an error, and if that error condition persists across > multiple tries, you will have broken the database completely: it will > be impossible to complete a checkpoint, and your WAL segment pool will > grow until it exhausts disk. So the idea of doing something that involves > unspecified extension behavior, especially possible interaction with > an external backup agent, right there is pretty terrifying. I think that API for extensions should tend to protect developer from breaking everything, but may allow it with precaution warnings in docs and comments. Please let me know if this assumption is incorrect. > > Other problems with the proposed patch: it misses coverage of > BgBufferSync, and I don't like exposing an ad-hoc structure like > CkptTsStatus as part of an extension API. The algorithm used by > BufferSync to schedule buffer writes has changed multiple times > before and doubtless will again; if we're going to have a hook > here it should depend as little as possible on those details. OK, now I see that «buf_internals.h» had word internals for a reason. Thanks for pointing that out, I didn’t knew about changes in these algorithms. Best regards, Andrey Borodin, Yandex. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding hook in BufferSync for backup purposes
Alvaro Herrerawrites: > I suppose your hook idea lets you implement the LSN fork in an > extension, rather than having it be part of core. The idea of hooking > onto BufferSync makes me uneasy, though -- like it's not the correct > place to do it. Yeah. Keep in mind that if the extension does anything at all that could possibly throw an error, and if that error condition persists across multiple tries, you will have broken the database completely: it will be impossible to complete a checkpoint, and your WAL segment pool will grow until it exhausts disk. So the idea of doing something that involves unspecified extension behavior, especially possible interaction with an external backup agent, right there is pretty terrifying. Other problems with the proposed patch: it misses coverage of BgBufferSync, and I don't like exposing an ad-hoc structure like CkptTsStatus as part of an extension API. The algorithm used by BufferSync to schedule buffer writes has changed multiple times before and doubtless will again; if we're going to have a hook here it should depend as little as possible on those details. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding hook in BufferSync for backup purposes
Андрей Бородин wrote: > ==What== > I propose to add hook inside BufferSync() function it inform extensions that > we > are going to write pages to disk. Please see patch attached. I pass a > timestamp > of the checkpoint, but it would be good if we could also pass there number of > checkpoint or something like this to ensure that some checkpoints were not > lost > (this could yield malformed backups). > > ==State== > This is just an idea to discuss, I could not find something like this in > pgsql-hackers as for now. Neither I could find similar hooks in the code. > Is this hook sufficient to implement page tracking for differential backups? > I’m not sure, but seems like it is. Hi, I remember discussing the topic of differential base-backups with somebody (probably Marco and Gabriele). The idea we had was to have a new relation fork which stores an LSN for each group of pages, indicating the LSN of the newest change to those pages. The backup tool "scans" the whole LSN fork, and grabs images of all pages that have LSNs newer than the one used for the previous base backup. (I think your sketch above should use LSNs rather than timestamps). I suppose your hook idea lets you implement the LSN fork in an extension, rather than having it be part of core. The idea of hooking onto BufferSync makes me uneasy, though -- like it's not the correct place to do it. I think it should be at the point where the buffer is modified (i.e. when WAL is written) rather than when it's checkpointed out. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Adding hook in BufferSync for backup purposes
Hi, hackers! I want to propose adding hook in BufferSync. ==Why==So that extensions could track pages changed between checkpoints. I think this can allow efficient differential backups taken right after checkpoint. And this functionality can be implemented as an extension. ==What==I propose to add hook inside BufferSync() function it inform extensions that we are going to write pages to disk. Please see patch attached. I pass a timestamp of the checkpoint, but it would be good if we could also pass there number of checkpoint or something like this to ensure that some checkpoints were not lost (this could yield malformed backups). ==State==This is just an idea to discuss, I could not find something like this in pgsql-hackers as for now. Neither I could find similar hooks in the code.Is this hook sufficient to implement page tracking for differential backups? I’m not sure, but seems like it is. ==Questions==Is this call enough to gather information about changed pages for backup purposes?Can we call extensions reliably from checkpointer process?Can we guaranty that extension won’t miss our call or we will defer BufferSync if extension have failed?Can we provide more flexibility for this hook? Any thought will be appreciated. Best regards, Andrey Borodin, Yandex.From 01d692e01716d6904847e4ce73faabbd7cc9fa97 Mon Sep 17 00:00:00 2001 From: Andrey BorodinDate: Sat, 5 Aug 2017 12:36:32 +0500 Subject: [PATCH] Add hook to BuferSync --- src/backend/storage/buffer/bufmgr.c | 37 +-- src/include/storage/buf_internals.h | 39 +++-- 2 files changed, 46 insertions(+), 30 deletions(-) diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 15795b0c5a..57bb5b89f0 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -76,34 +76,6 @@ typedef struct PrivateRefCountEntry /* 64 bytes, about the size of a cache line on common systems */ #define REFCOUNT_ARRAY_ENTRIES 8 -/* - * Status of buffers to checkpoint for a particular tablespace, used - * internally in BufferSync. - */ -typedef struct CkptTsStatus -{ - /* oid of the tablespace */ - Oid tsId; - - /* - * Checkpoint progress for this tablespace. To make progress comparable - * between tablespaces the progress is, for each tablespace, measured as a - * number between 0 and the total number of to-be-checkpointed pages. Each - * page checkpointed in this tablespace increments this space's progress - * by progress_slice. - */ - float8 progress; - float8 progress_slice; - - /* number of to-be checkpointed pages in this tablespace */ - int num_to_scan; - /* already processed pages in this tablespace */ - int num_scanned; - - /* current offset in CkptBufferIds for this tablespace */ - int index; -} CkptTsStatus; - /* GUC variables */ bool zero_damaged_pages = false; int bgwriter_lru_maxpages = 100; @@ -1764,6 +1736,10 @@ UnpinBuffer(BufferDesc *buf, bool fixOwner) } } + +/* Hook for plugins to get control in BufferSync() */ +checkpointer_buffer_sync_hook_type checkpointer_buffer_sync_hook = NULL; + /* * BufferSync -- Write out all dirty buffers in the pool. * @@ -1926,6 +1902,11 @@ BufferSync(int flags) Assert(num_spaces > 0); + if(checkpointer_buffer_sync_hook) + checkpointer_buffer_sync_hook(CheckpointStats.ckpt_write_t, + CkptBufferIds, + per_ts_stat); + /* * Build a min-heap over the write-progress in the individual tablespaces, * and compute how large a portion of the total progress a single diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h index b768b6fc96..b9e4f19f3a 100644 --- a/src/include/storage/buf_internals.h +++ b/src/include/storage/buf_internals.h @@ -278,8 +278,6 @@ extern PGDLLIMPORT WritebackContext BackendWritebackContext; /* in localbuf.c */ extern BufferDesc *LocalBufferDescriptors; -/* in bufmgr.c */ - /* * Structure to sort buffers per file on checkpoints. * @@ -295,9 +293,46 @@ typedef struct CkptSortItem int buf_id; } CkptSortItem; +/* in bufmgr.c */ extern CkptSortItem *CkptBufferIds; /* + * Status of buffers to checkpoint for a particular tablespace, used + * internally in BufferSync. + */ +typedef struct CkptTsStatus +{ + /* oid of the tablespace */ + Oid tsId; + + /* + * Checkpoint progress for this tablespace. To make progress comparable + * between tablespaces the progress is, for each tablespace, measured as a + * number between 0 and the total number of to-be-checkpointed pages. Each + * page checkpointed in this tablespace increments this space's progress + * by progress_slice. + */ + float8 progress; + float8 progress_slice; + + /* number of to-be checkpointed pages in this tablespace */ + int num_to_scan; + /* already processed pages in this tablespace */ + int num_scanned; + + /* current offset in CkptBufferIds for this tablespace */ + int