Re: [HACKERS] pg_basebackup and pg_stat_tmp directory
On Fri, Jan 31, 2014 at 9:29 PM, Fujii Masao wrote: > On Tue, Jan 28, 2014 at 5:51 PM, Magnus Hagander wrote: >> On Tue, Jan 28, 2014 at 6:11 AM, Amit Kapila >> wrote: >>> >>> On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao >>> wrote: >>> > Hi, >>> > >>> > The files in pg_stat_tmp directory don't need to be backed up because >>> > they are >>> > basically reset at the archive recovery. So I think it's worth >>> > changing pg_basebackup >>> > so that it skips any files in pg_stat_tmp directory. Thought? >>> >>> I think this is good idea, but can't it also avoid >>> PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in >>> pg_stat_tmp >>> >> >> All stats files should be excluded. IIRC the PGSTAT_STAT_PERMANENT_TMPFILE >> refers to just the global one. You want to exclude based on >> PGSTAT_STAT_PERMANENT_DIRECTORY (and of course based on the guc >> stats_temp_directory if it's in PGDATA. > > Attached patch changes basebackup.c so that it skips all files in both > pg_stat_tmp > and stats_temp_directory. Even when a user sets stats_temp_directory > to the directory > other than pg_stat_tmp, we need to skip the files in pg_stat_tmp. Because, > per recent change of pg_stat_statements, the external query file is > always created there. Committed. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup and pg_stat_tmp directory
On Mon, Feb 3, 2014 at 6:57 AM, Peter Eisentraut wrote: > On 2/2/14, 10:23 AM, Fujii Masao wrote: >> I'm thinking to change basebackup.c so that it compares the >> name of the directory that it's trying to back up and the setting >> value of log_directory parameter, then, if they are the same, >> it just skips the directory. The patch that I sent upthread does >> this regarding stats_temp_directory. > > I'm undecided on whether log files should be copied, but in case we > decide not to, it needs to be considered whether we at least recreate > the pg_log directory on the standby. Otherwise weird things will happen > when you start the standby, and it would introduce an extra fixup step > to sort that out. Yes, basically we should skip only files under pg_log, but not pg_log directory itself. > Extra credit for doing something useful when pg_log is a symlink. > > I fear, however, that if you end up implementing all that logic, it > would become too much special magic. ISTM that pg_xlog has already been handled in that way by basebackup. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup and pg_stat_tmp directory
On 2/2/14, 10:23 AM, Fujii Masao wrote: > I'm thinking to change basebackup.c so that it compares the > name of the directory that it's trying to back up and the setting > value of log_directory parameter, then, if they are the same, > it just skips the directory. The patch that I sent upthread does > this regarding stats_temp_directory. I'm undecided on whether log files should be copied, but in case we decide not to, it needs to be considered whether we at least recreate the pg_log directory on the standby. Otherwise weird things will happen when you start the standby, and it would introduce an extra fixup step to sort that out. Extra credit for doing something useful when pg_log is a symlink. I fear, however, that if you end up implementing all that logic, it would become too much special magic. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup and pg_stat_tmp directory
On Sat, Feb 1, 2014 at 2:08 AM, Robert Haas wrote: > On Fri, Jan 31, 2014 at 8:40 AM, Fujii Masao wrote: >> Yeah, I was thinking that, too. I'm not sure whether including log files >> in backup really increases the security risk, though. There are already >> very important data, i.e., database, in backups. Anyway, since >> the amount of log files can be very large and they are not essential >> for recovery, it's worth considering whether to exclude them. OTOH, >> I'm sure that some users prefer current behavior for some reasons. >> So I think that it's better to expose the pg_basebackup option >> specifying whether log files are included in backups or not. > > I don't really see how this can work reliably; pg_log isn't a > hard-coded name, but rather a GUC that users can leave set to that > value or set to any other value they choose. I'm thinking to change basebackup.c so that it compares the name of the directory that it's trying to back up and the setting value of log_directory parameter, then, if they are the same, it just skips the directory. The patch that I sent upthread does this regarding stats_temp_directory. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup and pg_stat_tmp directory
On Fri, Jan 31, 2014 at 8:40 AM, Fujii Masao wrote: > Yeah, I was thinking that, too. I'm not sure whether including log files > in backup really increases the security risk, though. There are already > very important data, i.e., database, in backups. Anyway, since > the amount of log files can be very large and they are not essential > for recovery, it's worth considering whether to exclude them. OTOH, > I'm sure that some users prefer current behavior for some reasons. > So I think that it's better to expose the pg_basebackup option > specifying whether log files are included in backups or not. I don't really see how this can work reliably; pg_log isn't a hard-coded name, but rather a GUC that users can leave set to that value or set to any other value they choose. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup and pg_stat_tmp directory
2014-01-31 Fujii Masao : > On Fri, Jan 31, 2014 at 10:18 PM, Mitsumasa KONDO > wrote: > > 2014-01-31 Fujii Masao > >> > >> On Tue, Jan 28, 2014 at 5:51 PM, Magnus Hagander > >> wrote: > >> > On Tue, Jan 28, 2014 at 6:11 AM, Amit Kapila > > >> > wrote: > >> >> > >> >> On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao > >> >> wrote: > >> >> > Hi, > >> >> > > >> >> > The files in pg_stat_tmp directory don't need to be backed up > because > >> >> > they are > >> >> > basically reset at the archive recovery. So I think it's worth > >> >> > changing pg_basebackup > >> >> > so that it skips any files in pg_stat_tmp directory. Thought? > >> >> > >> >> I think this is good idea, but can't it also avoid > >> >> PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in > >> >> pg_stat_tmp > >> >> > >> > > >> > All stats files should be excluded. IIRC the > >> > PGSTAT_STAT_PERMANENT_TMPFILE > >> > refers to just the global one. You want to exclude based on > >> > PGSTAT_STAT_PERMANENT_DIRECTORY (and of course based on the guc > >> > stats_temp_directory if it's in PGDATA. > >> > >> Attached patch changes basebackup.c so that it skips all files in both > >> pg_stat_tmp > >> and stats_temp_directory. Even when a user sets stats_temp_directory > >> to the directory > >> other than pg_stat_tmp, we need to skip the files in pg_stat_tmp. > Because, > >> per recent change of pg_stat_statements, the external query file is > >> always created there. > > > > +1. > > > > And, I'd like to also skip pg_log directory because security reason. > > Yeah, I was thinking that, too. I'm not sure whether including log files > in backup really increases the security risk, though. There are already > very important data, i.e., database, in backups. Anyway, since > the amount of log files can be very large and they are not essential > for recovery, it's worth considering whether to exclude them. OTOH, > I'm sure that some users prefer current behavior for some reasons. > So I think that it's better to expose the pg_basebackup option > specifying whether log files are included in backups or not. I'm with you. Thanks a lot ! Regards, -- Mitsumsasa KONDO NTT Open Source Software Center
Re: [HACKERS] pg_basebackup and pg_stat_tmp directory
On Fri, Jan 31, 2014 at 10:18 PM, Mitsumasa KONDO wrote: > 2014-01-31 Fujii Masao >> >> On Tue, Jan 28, 2014 at 5:51 PM, Magnus Hagander >> wrote: >> > On Tue, Jan 28, 2014 at 6:11 AM, Amit Kapila >> > wrote: >> >> >> >> On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao >> >> wrote: >> >> > Hi, >> >> > >> >> > The files in pg_stat_tmp directory don't need to be backed up because >> >> > they are >> >> > basically reset at the archive recovery. So I think it's worth >> >> > changing pg_basebackup >> >> > so that it skips any files in pg_stat_tmp directory. Thought? >> >> >> >> I think this is good idea, but can't it also avoid >> >> PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in >> >> pg_stat_tmp >> >> >> > >> > All stats files should be excluded. IIRC the >> > PGSTAT_STAT_PERMANENT_TMPFILE >> > refers to just the global one. You want to exclude based on >> > PGSTAT_STAT_PERMANENT_DIRECTORY (and of course based on the guc >> > stats_temp_directory if it's in PGDATA. >> >> Attached patch changes basebackup.c so that it skips all files in both >> pg_stat_tmp >> and stats_temp_directory. Even when a user sets stats_temp_directory >> to the directory >> other than pg_stat_tmp, we need to skip the files in pg_stat_tmp. Because, >> per recent change of pg_stat_statements, the external query file is >> always created there. > > +1. > > And, I'd like to also skip pg_log directory because security reason. Yeah, I was thinking that, too. I'm not sure whether including log files in backup really increases the security risk, though. There are already very important data, i.e., database, in backups. Anyway, since the amount of log files can be very large and they are not essential for recovery, it's worth considering whether to exclude them. OTOH, I'm sure that some users prefer current behavior for some reasons. So I think that it's better to expose the pg_basebackup option specifying whether log files are included in backups or not. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup and pg_stat_tmp directory
2014-01-31 Fujii Masao > On Tue, Jan 28, 2014 at 5:51 PM, Magnus Hagander > wrote: > > On Tue, Jan 28, 2014 at 6:11 AM, Amit Kapila > > wrote: > >> > >> On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao > >> wrote: > >> > Hi, > >> > > >> > The files in pg_stat_tmp directory don't need to be backed up because > >> > they are > >> > basically reset at the archive recovery. So I think it's worth > >> > changing pg_basebackup > >> > so that it skips any files in pg_stat_tmp directory. Thought? > >> > >> I think this is good idea, but can't it also avoid > >> PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in > >> pg_stat_tmp > >> > > > > All stats files should be excluded. IIRC the > PGSTAT_STAT_PERMANENT_TMPFILE > > refers to just the global one. You want to exclude based on > > PGSTAT_STAT_PERMANENT_DIRECTORY (and of course based on the guc > > stats_temp_directory if it's in PGDATA. > > Attached patch changes basebackup.c so that it skips all files in both > pg_stat_tmp > and stats_temp_directory. Even when a user sets stats_temp_directory > to the directory > other than pg_stat_tmp, we need to skip the files in pg_stat_tmp. Because, > per recent change of pg_stat_statements, the external query file is > always created there. > +1. And, I'd like to also skip pg_log directory because security reason. If you have time and get community agreed, could you create these patch after committed your patch? I don't want to bother you. Regards, -- Mitsumasa KONDO NTT Open Source Software Center
Re: [HACKERS] pg_basebackup and pg_stat_tmp directory
On Tue, Jan 28, 2014 at 5:51 PM, Magnus Hagander wrote: > On Tue, Jan 28, 2014 at 6:11 AM, Amit Kapila > wrote: >> >> On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao >> wrote: >> > Hi, >> > >> > The files in pg_stat_tmp directory don't need to be backed up because >> > they are >> > basically reset at the archive recovery. So I think it's worth >> > changing pg_basebackup >> > so that it skips any files in pg_stat_tmp directory. Thought? >> >> I think this is good idea, but can't it also avoid >> PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in >> pg_stat_tmp >> > > All stats files should be excluded. IIRC the PGSTAT_STAT_PERMANENT_TMPFILE > refers to just the global one. You want to exclude based on > PGSTAT_STAT_PERMANENT_DIRECTORY (and of course based on the guc > stats_temp_directory if it's in PGDATA. Attached patch changes basebackup.c so that it skips all files in both pg_stat_tmp and stats_temp_directory. Even when a user sets stats_temp_directory to the directory other than pg_stat_tmp, we need to skip the files in pg_stat_tmp. Because, per recent change of pg_stat_statements, the external query file is always created there. Regards, -- Fujii Masao *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** *** 68,73 --- 68,74 #include "parser/analyze.h" #include "parser/parsetree.h" #include "parser/scanner.h" + #include "pgstat.h" #include "storage/fd.h" #include "storage/ipc.h" #include "storage/spin.h" *** *** 89,95 PG_MODULE_MAGIC; * race conditions. Besides, we only expect modest, infrequent I/O for query * strings, so placing the file on a faster filesystem is not compelling. */ ! #define PGSS_TEXT_FILE "pg_stat_tmp/pgss_query_texts.stat" /* Magic number identifying the stats file format */ static const uint32 PGSS_FILE_HEADER = 0x20140125; --- 90,96 * race conditions. Besides, we only expect modest, infrequent I/O for query * strings, so placing the file on a faster filesystem is not compelling. */ ! #define PGSS_TEXT_FILE PG_STAT_TMP_DIR "/pgss_query_texts.stat" /* Magic number identifying the stats file format */ static const uint32 PGSS_FILE_HEADER = 0x20140125; *** a/src/backend/replication/basebackup.c --- b/src/backend/replication/basebackup.c *** *** 25,30 --- 25,31 #include "libpq/pqformat.h" #include "miscadmin.h" #include "nodes/pg_list.h" + #include "pgstat.h" #include "replication/basebackup.h" #include "replication/walsender.h" #include "replication/walsender_private.h" *** *** 63,68 static int compareWalFileNames(const void *a, const void *b); --- 64,72 /* Was the backup currently in-progress initiated in recovery mode? */ static bool backup_started_in_recovery = false; + /* Relative path of temporary statistics directory */ + static char *statrelpath = NULL; + /* * Size of each block sent into the tar stream for larger files. */ *** *** 111,116 perform_base_backup(basebackup_options *opt, DIR *tblspcdir) --- 115,132 &labelfile); SendXlogRecPtrResult(startptr, starttli); + /* + * Calculate the relative path of temporary statistics directory + * in order to skip the files which are located in that directory later. + */ + if (is_absolute_path(pgstat_stat_directory) && + strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0) + statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1); + else if (strncmp(pgstat_stat_directory, "./", 2) != 0) + statrelpath = psprintf("./%s", pgstat_stat_directory); + else + statrelpath = pgstat_stat_directory; + PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0); { List *tablespaces = NIL; *** *** 838,844 sendDir(char *path, int basepathlen, bool sizeonly, List *tablespaces) sizeof(PG_AUTOCONF_FILENAME) + 4) == 0) continue; - /* * If there's a backup_label file, it belongs to a backup started by * the user with pg_start_backup(). It is *not* correct for this --- 854,859 *** *** 888,893 sendDir(char *path, int basepathlen, bool sizeonly, List *tablespaces) --- 903,922 } /* + * Skip temporary statistics files. PG_STAT_TMP_DIR must be skipped + * even when stats_temp_directory is set because PGSS_TEXT_FILE is + * always created there. + */ + if ((statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0) || + strncmp(de->d_name, PG_STAT_TMP_DIR, strlen(PG_STAT_TMP_DIR)) == 0) + { + if (!sizeonly) + _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf); + size += 512; + continue; + } + + /* * We can skip pg_xlog, the WAL segments need to be fetched from the * WAL archive anyway. But include it as an empty directory anyway, so * we get permissions right. *** a/src/backend/utils/misc/guc.c --- b/
Re: [HACKERS] pg_basebackup and pg_stat_tmp directory
On Tue, Jan 28, 2014 at 6:11 AM, Amit Kapila wrote: > On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao > wrote: > > Hi, > > > > The files in pg_stat_tmp directory don't need to be backed up because > they are > > basically reset at the archive recovery. So I think it's worth > > changing pg_basebackup > > so that it skips any files in pg_stat_tmp directory. Thought? > > I think this is good idea, but can't it also avoid > PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in > pg_stat_tmp > > All stats files should be excluded. IIRC the PGSTAT_STAT_PERMANENT_TMPFILE refers to just the global one. You want to exclude based on PGSTAT_STAT_PERMANENT_DIRECTORY (and of course based on the guc stats_temp_directory if it's in PGDATA. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] pg_basebackup and pg_stat_tmp directory
On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao wrote: > Hi, > > The files in pg_stat_tmp directory don't need to be backed up because they are > basically reset at the archive recovery. So I think it's worth > changing pg_basebackup > so that it skips any files in pg_stat_tmp directory. Thought? I think this is good idea, but can't it also avoid PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in pg_stat_tmp With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup and pg_stat_tmp directory
On Tue, Jan 28, 2014 at 1:08 PM, Michael Paquier wrote: > On Tue, Jan 28, 2014 at 12:56 PM, Fujii Masao wrote: >> Hi, >> >> The files in pg_stat_tmp directory don't need to be backed up because they >> are >> basically reset at the archive recovery. So I think it's worth >> changing pg_basebackup >> so that it skips any files in pg_stat_tmp directory. Thought? > Skipping pgstat_temp_directory in basebackup.c would make more sense > than directly touching pg_basebackup. > My 2c. Yeah, that's what I was thinking :) Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup and pg_stat_tmp directory
On Tue, Jan 28, 2014 at 12:56 PM, Fujii Masao wrote: > Hi, > > The files in pg_stat_tmp directory don't need to be backed up because they are > basically reset at the archive recovery. So I think it's worth > changing pg_basebackup > so that it skips any files in pg_stat_tmp directory. Thought? Skipping pgstat_temp_directory in basebackup.c would make more sense than directly touching pg_basebackup. My 2c. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers