Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
Attached is the patch as submitted for commitfest. Please note, I am not adverse to adding an additional --Include-path directive if that avoids backwards-compatibility problems. However the patch is complex enough I would really prefer review on the rest of it to start first. This doesn't strike me as perfectly trivial and I think it is easier to review pieces separately. I have already started on the --Include-path directive and could probably get it attached to a later version of the patch very soon. I would also like to address a couple of important points here: 1. I think default restrictions plus additional paths is the best, safest way forward. Excluding shell-globs doesn't solve the "I need this particular config file" very well particularly if we want to support this outside of an internal environment. Shell globs also tend to be overly inclusive and so if you exclude based on them, you run into a chance that your rewind is corrupt for being overly exclusive. 2. I would propose any need for an additional paths be specified using an --Include-path directive. This could be specified multiple times and could point to a file or directory which would be added to the paths rewound. I have a patch for this mostly done but I am concerned that these sorts of changes result in a combination of changes that are easier to review separately than together. So if needed, I can add it or in a separate patch following. 3. I think it would be a mistake to tie backup solutions in non-replicated environments to replication use cases, and vice versa. Things like replication slots (used for streaming backups) have different considerations in different environments. Rather than build the same infrastructure first, I think it is better to support different use cases well and then build common infrastructure to support the different cases. I am not against building common infrastructure for pg_rewind and pg_basebackup. I am very much against having the core guarantees being the exact same. Best Wishes, Chris Travers On Sat, Oct 28, 2017 at 1:22 PM, Chris Travers wrote: > Hi; > > There are still some cleanup bits needed here but I wanted to get feedback > on my general direction. > > I hope to submit for commit fest soon if the general feedback is good. > Tests are passing (with adjustments intended for change of behaviour in one > test script). I want to note that Crimson Thompson (also with Adjust) has > provided some valuable discussion on this code. > > The Problem: > > pg_rewind makes no real guarantees regarding the final state of non-data > files or directories. It.checks to see if the timeline has incremented > (and therefore guarantees that if successful the data directories are on > the same timeline) but for non-data files, these are clobbered if we rewind > and left intact if not. These other files include postgresql.auto.conf, > replication slots, and can include log files. > > Copying logs over to the new slave is something one probably never wants > to do (same with replication slots), and the behaviours here can mask > troubleshooting regarding what a particular master failed, cause wal > segments to build up, automatic settings changes, and other undesirable > behaviours. Together these make pg_rewind very difficult to use properly > and push tasks to replication management tooling that the management tools > are not in a good position to handle correctly. > > Designing the Fix: > > Two proposed fixes have been considered and one selected: Either we > whitelist directories and only rewind those. The other was to allow shell > globs to be provided that could be used to exclude files. The whitelisting > solution was chosen for a number of reasons. > > When pg_rewind "succeeds" but not quite correctly the result is usually a > corrupted installation which requires a base backup to replace it anyway. > In a recovery situation, sometimes pressures occur which render human > judgment less effective, and therefore glob exclusion strikes me as > something which would probably do more harm than good, but maybe I don't > understand the use case (comments as to why some people look at the other > solution as preferable would be welcome). > > In going with the whitelisting solution, we chose to include all > directories with WAL-related information.This allows more predicable > interaction with other parts of the replication chain. Consequently not > only do we copy pg_wal and pg_xact but also commit timestamps and so forth. > > The Solution: > > The solution is a whitelist of directories specified which are the only > ones which are synchronised. The relevant part of this patch is: > > +/* List of directories to synchronize: > > + * base data dirs (and ablespaces) > > + * wal/transaction data > > + * and that is it. > > + * > > + * This array is null-terminated to make > > + * it easy to expand > > + */ > > + > > +const char *rewind_dirs[] = { > > +"base", > > +"global", >
Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
On 10/30/17 6:36 AM, Michael Paquier wrote: > On Mon, Oct 30, 2017 at 10:15 AM, Chris Travers >> >> How does rep mgr or other programs using pg_rewind know what to exclude? > > Good question. Answers could come from folks such as David Steele > (pgBackRest) or Marco (barman) whom I am attaching in CC. pgBackRest does not use pg_rewind. However, pgBackRest does use the same exclusion list for backups as pg_basebackup. -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
On Mon, Oct 30, 2017 at 11:36 AM, Michael Paquier wrote: > On Mon, Oct 30, 2017 at 10:15 AM, Chris Travers > wrote: > > This also brings up a fairly major concern more generally about control > by > > the way. A lot of cases where pg_rewind is called, the user doesn't > > necessarily have much control on how it is called. Moreover in many of > > these cases, the user is probably not in a position to understand the > > internals well enough to grasp what to check after. > > Likely they are not. > So currently I am submitting the current patch to commit fest. I am open to adding a new --Include-path option in this patch, but I am worried about atomicity concerns, and I am still not really sure what the impact is for you (I haven't heard whether you expect this file to be there before the rewind, i.e. whether it would be on both master and slave, or just on the slave). Sp there is the question of under what specific circumstances this would break for you. > > > Right, but there is a use case difference between "I am taking a backup > of a > > server" and "I need to get the server into rejoin the replication as a > > standby." > > The intersection of the first and second categories is not empty. Some > take backups and use them to deploy standbys. > Sure, but it is not complete either. > > > A really good example of where this is a big problem is with replication > > slots. On a backup I would expect you want replication slots to be > copied > > in. > > I would actually expect the contrary, and please note that replication > slots are not taken in a base backup, which is what the documentation > says as well: > https://www.postgresql.org/docs/10/static/protocol-replication.html > "pg_dynshmem, pg_notify, pg_replslot, pg_serial, pg_snapshots, > pg_stat_tmp, and pg_subtrans are copied as empty directories (even if > they are symbolic links)." > Many of these are emptied on server restart but repslot is not. What is the logic for excluding it from backups other than to avoid problems with replication provisioning? I mean if I have a replication slot for taking backups with barman (and I am not actually doing replication) why would I *not* want that in my base backup if I might have to restore to a new machine somewhere? > > Some code I have with 9.6's pg_bsaebackup removes manually replication > slots as this logic is new in 10 ;) > > >> The pattern that base backups now use is an exclude list. In the > >> future I would rather see base backups and pg_rewind using the same > >> infrastructure for both things: > >> - pg_rewind should use the replication protocol with BASE_BACKUP. > >> Having it rely on root access now is crazy. > >> - BASE_BACKUP should have an option where it is possible to exclude > >> custom paths. > >> What you are proposing here would make both diverge more, which is in > >> my opinion not a good approach. > > > > How does rep mgr or other programs using pg_rewind know what to exclude? > > Good question. Answers could come from folks such as David Steele > (pgBackRest) or Marco (barman) whom I am attaching in CC. > Two points that further occur to me: Shell globs seem to me to be foot guns in this area, I think special paths should be one path per invocation of the option not "--exclude=pg_w*" since this is just asking for problems as time goes on and things get renamed. It also seems to me that adding specific paths is far safer than removing specific paths. > -- > Michael > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
On Mon, Oct 30, 2017 at 10:15 AM, Chris Travers wrote: > This also brings up a fairly major concern more generally about control by > the way. A lot of cases where pg_rewind is called, the user doesn't > necessarily have much control on how it is called. Moreover in many of > these cases, the user is probably not in a position to understand the > internals well enough to grasp what to check after. Likely they are not. > Right, but there is a use case difference between "I am taking a backup of a > server" and "I need to get the server into rejoin the replication as a > standby." The intersection of the first and second categories is not empty. Some take backups and use them to deploy standbys. > A really good example of where this is a big problem is with replication > slots. On a backup I would expect you want replication slots to be copied > in. I would actually expect the contrary, and please note that replication slots are not taken in a base backup, which is what the documentation says as well: https://www.postgresql.org/docs/10/static/protocol-replication.html "pg_dynshmem, pg_notify, pg_replslot, pg_serial, pg_snapshots, pg_stat_tmp, and pg_subtrans are copied as empty directories (even if they are symbolic links)." Some code I have with 9.6's pg_bsaebackup removes manually replication slots as this logic is new in 10 ;) >> The pattern that base backups now use is an exclude list. In the >> future I would rather see base backups and pg_rewind using the same >> infrastructure for both things: >> - pg_rewind should use the replication protocol with BASE_BACKUP. >> Having it rely on root access now is crazy. >> - BASE_BACKUP should have an option where it is possible to exclude >> custom paths. >> What you are proposing here would make both diverge more, which is in >> my opinion not a good approach. > > How does rep mgr or other programs using pg_rewind know what to exclude? Good question. Answers could come from folks such as David Steele (pgBackRest) or Marco (barman) whom I am attaching in CC. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
On Mon, Oct 30, 2017 at 10:57 AM, Michael Paquier wrote: > On Mon, Oct 30, 2017 at 9:43 AM, Chris Travers > wrote: > > Are there any cases right now where you have features added by > extensions that write to directories which are required for a rewind? > > In some of the stuff I maintain, I actually have one case now of a > configuration file included with include_if_exists by postgresql.conf > and is expected to be generated by a component that my code doing the > rewind has no direct access on... I can control how pg_rewind kicks > in, but I think that you would actually break silently the current > code, which is scary especially if I don't control the code where > pg_rewind is called when Postgres 11 gets integrated into the product > I am thinking about, and even more scary if there is no way to include > something. > Ok, so there is an argument that there needs to be a way to include additional paths in this patch. One important question I would have in these cases is if you expect these to be set only on the master. If not, then is this a problem and if not then how do you handle the fail-back problem etc? This also brings up a fairly major concern more generally about control by the way. A lot of cases where pg_rewind is called, the user doesn't necessarily have much control on how it is called. Moreover in many of these cases, the user is probably not in a position to understand the internals well enough to grasp what to check after. > > > The problem with an exclude list is that we cannot safely exclude > > configuration files or logs (because these could be named anything), > right? > > You have the exact same problem with base backups now, and any > configuration files created by extensions are a per-case problem... > Right, but there is a use case difference between "I am taking a backup of a server" and "I need to get the server into rejoin the replication as a standby." A really good example of where this is a big problem is with replication slots. On a backup I would expect you want replication slots to be copied in. However when setting yourself up as a slave you most certainly do not want to just copy these over from the master unless you have infinite disk space. I would argue that under *no* circumstance should pg_rewind *ever* copy replication slots. But pg_base_backup really *should* (and when provisioning a new slave you should clear these as soon as it is set up). The pattern that base backups now use is an exclude list. In the > future I would rather see base backups and pg_rewind using the same > infrastructure for both things: > - pg_rewind should use the replication protocol with BASE_BACKUP. > Having it rely on root access now is crazy. > - BASE_BACKUP should have an option where it is possible to exclude > custom paths. > What you are proposing here would make both diverge more, which is in > my opinion not a good approach. > How does rep mgr or other programs using pg_rewind know what to exclude? Again I am not convinced setting up a replica and taking a backup for disaster recovery are the same use case or have the same requirements. -- > Michael > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
On Mon, Oct 30, 2017 at 9:43 AM, Chris Travers wrote: > Are there any cases right now where you have features added by extensions > that write to directories which are required for a rewind? In some of the stuff I maintain, I actually have one case now of a configuration file included with include_if_exists by postgresql.conf and is expected to be generated by a component that my code doing the rewind has no direct access on... I can control how pg_rewind kicks in, but I think that you would actually break silently the current code, which is scary especially if I don't control the code where pg_rewind is called when Postgres 11 gets integrated into the product I am thinking about, and even more scary if there is no way to include something. > The problem with an exclude list is that we cannot safely exclude > configuration files or logs (because these could be named anything), right? You have the exact same problem with base backups now, and any configuration files created by extensions are a per-case problem... The pattern that base backups now use is an exclude list. In the future I would rather see base backups and pg_rewind using the same infrastructure for both things: - pg_rewind should use the replication protocol with BASE_BACKUP. Having it rely on root access now is crazy. - BASE_BACKUP should have an option where it is possible to exclude custom paths. What you are proposing here would make both diverge more, which is in my opinion not a good approach. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
First, thanks for your thoughts on this, and I am interested in probing them more. On Mon, Oct 30, 2017 at 9:04 AM, Michael Paquier wrote: > On Sat, Oct 28, 2017 at 4:22 AM, Chris Travers > wrote: > > The Solution: > > The solution is a whitelist of directories specified which are the only > ones > > which are synchronised. The relevant part of this patch is: > > > > +/* List of directories to synchronize: > > + * base data dirs (and ablespaces) > > + * wal/transaction data > > + * and that is it. > > + * > > + * This array is null-terminated to make > > + * it easy to expand > > + */ > > > > +const char *rewind_dirs[] = { > > +"base", > > +"global", > > +"pg_commit_ts", > > +"pg_logical", > > +"pg_multixact", > > +"pg_serial", > > +"pg_subtrans", > > +"pg_tblspc", > > +"pg_twophase", > > +"pg_wal", > > +"pg_xact", > > +NULL > > +}; > > The problem with a white list, which is one reason why I do not favour > it, is in the case where a feature adds in the data folder a new path > for its data, particularly in the case where this is critical for a > base backup. If this folder is not added in this list, then a rewind > may be silently corrupted as well. There are also a set of directories > in this list that we do not care about, pg_serial being one. > pg_subtrans is a second, as it gets zeroed on startup. > The problem with an exclude list is that we cannot safely exclude configuration files or logs (because these could be named anything), right? Are there any cases right now where you have features added by extensions that write to directories which are required for a rewind? I am asking because I would like to see if this is the minimum working change or if this change is fundamentally broken currently and must be extended to allow custom paths to be sync'd as well. > > And if you think about it, pg_rewind is actually the *same* processing > as a base backup taken using the replication protocol. So instead of > this patch I would recommend using excludeFiles and excludeDirContents > by moving this list to a common header where frontend and backend can > refer to. Note that basebackup.h includes replnodes.h, so my thoughts > is that you should split the current header with something like > basebackup_internal.h which is backend-only, and have pg_rewind fetch > the list of directories to automatically ignore as those ones. You can > also simplify a bit the code of pg_rewind a bit so as things like > postmaster.opts. On top of that, there is visibly no need to tweak the > SQL query fetching the directory list (which is useful for debugging > as shaped now actually), but just change process_source_file so as its > content is not added in the file map. > I am not sure it *should* be the same, however. In a backup we probably want to backup the postgresql.auto.conf, but on a failover, I don't think we want to clobber configuration. We certainly do not want to sometimes clobber configuration but not other times (which is what happens right now in some cases). And under no circumstances do we want to clobber logs on a failed server with logs on a working server. That's asking for serious problems in my view. If you think about it, there's a huge difference in use case in backing up a database cluster (Including replication slots, configs in the dir etc) and re-syncing the data so that replication can resume, and I think there are some dangers that come up when assuming these should be closely tied together. > > Then there is a second case where you do not want a set of folders to > be included, but those can be useful by default, an example here being > pg_wal where one might want to have an empty path to begin with. A > third case is a set of folders that you do not care about, but depends > on the deployment, being here "log" for example. For those you could > just add an --exclude-path option which simply piles up paths into a > linked list when called multiple times. This could happen with a > second patch. > Agreed. And one could add an "--include-path" option to allow for unusual cases where you want extra directories, such as replication slots, or the like. I think another patch could also specifically empty and perhaps create a replication slot allowing for one to bring tings back up via streaming replication more safely. > > > From there we iterate over this array for directory-based approaches in > > copy_fetch.c and with one query per directory in libpq_fetch.c. This > also > > means shifting from the basic interface from PQexec to PQexecParams. It > > would be possible to move to binary formats too, but this was not done > > currently in order to simplify code review (that could be a separate > > independent patch at a later time). > > -res = PQexec(conn, sql); > +for (p = 0; rewind_dirs[p] != NULL; ++p) > +{ > +const char *paths[1]; > +paths[0] = rewind_dirs[p]; > +res = PQexecParams(conn, sql,
Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
On Sat, Oct 28, 2017 at 4:22 AM, Chris Travers wrote: > The Solution: > The solution is a whitelist of directories specified which are the only ones > which are synchronised. The relevant part of this patch is: > > +/* List of directories to synchronize: > + * base data dirs (and ablespaces) > + * wal/transaction data > + * and that is it. > + * > + * This array is null-terminated to make > + * it easy to expand > + */ > > +const char *rewind_dirs[] = { > +"base", > +"global", > +"pg_commit_ts", > +"pg_logical", > +"pg_multixact", > +"pg_serial", > +"pg_subtrans", > +"pg_tblspc", > +"pg_twophase", > +"pg_wal", > +"pg_xact", > +NULL > +}; The problem with a white list, which is one reason why I do not favor it, is in the case where a feature adds in the data folder a new path for its data, particularly in the case where this is critical for a base backup. If this folder is not added in this list, then a rewind may be silently corrupted as well. There are also a set of directories in this list that we do not care about, pg_serial being one. pg_subtrans is a second, as it gets zeroed on startup. And if you think about it, pg_rewind is actually the *same* processing as a base backup taken using the replication protocol. So instead of this patch I would recommend using excludeFiles and excludeDirContents by moving this list to a common header where frontend and backend can refer to. Note that basebackup.h includes replnodes.h, so my thoughts is that you should split the current header with something like basebackup_internal.h which is backend-only, and have pg_rewind fetch the list of directories to automatically ignore as those ones. You can also simplify a bit the code of pg_rewind a bit so as things like postmaster.opts. On top of that, there is visibly no need to tweak the SQL query fetching the directory list (which is useful for debugging as shaped now actually), but just change process_source_file so as its content is not added in the file map. Then there is a second case where you do not want a set of folders to be included, but those can be useful by default, an example here being pg_wal where one might want to have an empty path to begin with. A third case is a set of folders that you do not care about, but depends on the deployment, being here "log" for example. For those you could just add an --exclude-path option which simply piles up paths into a linked list when called multiple times. This could happen with a second patch. > From there we iterate over this array for directory-based approaches in > copy_fetch.c and with one query per directory in libpq_fetch.c. This also > means shifting from the basic interface from PQexec to PQexecParams. It > would be possible to move to binary formats too, but this was not done > currently in order to simplify code review (that could be a separate > independent patch at a later time). -res = PQexec(conn, sql); +for (p = 0; rewind_dirs[p] != NULL; ++p) +{ +const char *paths[1]; +paths[0] = rewind_dirs[p]; +res = PQexecParams(conn, sql, 1, NULL, paths, NULL, NULL, 0); Calling multiple times the query to list all directories is messy IMO. And this is N-costly processing if there are many files to look at, say many relation files. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
On Sat, Oct 28, 2017 at 4:52 PM, Chris Travers wrote: > There are still some cleanup bits needed here but I wanted to get feedback > on my general direction. > > I hope to submit for commit fest soon if the general feedback is good. I think you should submit to the CommitFest regardless, to increase your chances of getting feedback. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WIP: Restricting pg_rewind to data/wal dirs
Hi; There are still some cleanup bits needed here but I wanted to get feedback on my general direction. I hope to submit for commit fest soon if the general feedback is good. Tests are passing (with adjustments intended for change of behaviour in one test script). I want to note that Crimson Thompson (also with Adjust) has provided some valuable discussion on this code. The Problem: pg_rewind makes no real guarantees regarding the final state of non-data files or directories. It.checks to see if the timeline has incremented (and therefore guarantees that if successful the data directories are on the same timeline) but for non-data files, these are clobbered if we rewind and left intact if not. These other files include postgresql.auto.conf, replication slots, and can include log files. Copying logs over to the new slave is something one probably never wants to do (same with replication slots), and the behaviours here can mask troubleshooting regarding what a particular master failed, cause wal segments to build up, automatic settings changes, and other undesirable behaviours. Together these make pg_rewind very difficult to use properly and push tasks to replication management tooling that the management tools are not in a good position to handle correctly. Designing the Fix: Two proposed fixes have been considered and one selected: Either we whitelist directories and only rewind those. The other was to allow shell globs to be provided that could be used to exclude files. The whitelisting solution was chosen for a number of reasons. When pg_rewind "succeeds" but not quite correctly the result is usually a corrupted installation which requires a base backup to replace it anyway. In a recovery situation, sometimes pressures occur which render human judgment less effective, and therefore glob exclusion strikes me as something which would probably do more harm than good, but maybe I don't understand the use case (comments as to why some people look at the other solution as preferable would be welcome). In going with the whitelisting solution, we chose to include all directories with WAL-related information.This allows more predicable interaction with other parts of the replication chain. Consequently not only do we copy pg_wal and pg_xact but also commit timestamps and so forth. The Solution: The solution is a whitelist of directories specified which are the only ones which are synchronised. The relevant part of this patch is: +/* List of directories to synchronize: + * base data dirs (and ablespaces) + * wal/transaction data + * and that is it. + * + * This array is null-terminated to make + * it easy to expand + */ + +const char *rewind_dirs[] = { +"base", +"global", +"pg_commit_ts", +"pg_logical", +"pg_multixact", +"pg_serial", +"pg_subtrans", +"pg_tblspc", +"pg_twophase", +"pg_wal", +"pg_xact", +NULL +}; From there we iterate over this array for directory-based approaches in copy_fetch.c and with one query per directory in libpq_fetch.c. This also means shifting from the basic interface from PQexec to PQexecParams. It would be possible to move to binary formats too, but this was not done currently in order to simplify code review (that could be a separate independent patch at a later time). Testing Done: The extra files tests correctly test this change in behaviour. The tests have been modified, and diagnostics in cases of failures expanded, in this case. The other tests provide good coverage of whether pg_rewind does what it is supposed to do. Cleanup still required: I accidentally left Carp::Always in the PM in this perl module. It will be fixed. I renamed one of the functions used to have a more descriptive name but currently did not remove the old function yet. Feedback is very welcome. pg_rewind is a very nice piece of software. I am hoping that these sorts of changes will help ensure that it is easier to use and provides more predictable results. -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin pg_rewind_restrict.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers