Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-29 Thread David Steele
On 9/28/16 9:55 PM, Peter Eisentraut wrote:
> On 9/28/16 2:45 AM, Michael Paquier wrote:
>> After all that fixed, I have moved the patch to "Ready for Committer".
>> Please use the updated patch though.
> 
> Committed after some cosmetic changes.

Thank you, Peter!

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-28 Thread Peter Eisentraut
On 9/28/16 2:45 AM, Michael Paquier wrote:
> After all that fixed, I have moved the patch to "Ready for Committer".
> Please use the updated patch though.

Committed after some cosmetic changes.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-28 Thread David Steele
On 9/28/16 2:45 AM, Michael Paquier wrote:
> On Tue, Sep 27, 2016 at 11:27 PM, David Steele  wrote:
>> On 9/26/16 2:36 AM, Michael Paquier wrote:
>>
>>> Just a nit:
>>>  
>>> - postmaster.pid
>>> + postmaster.pid and postmaster.opts
>>>  
>>> Having one  block for each file would be better.
>>
>> OK, changed.
> 
> You did not actually change it :)

Oops, this was intended to mean that I changed:

>>> +const char *excludeFile[] =
>>> excludeFiles[]?

> +
> + backup_label and tablespace_map.  If these
> + files exist they belong to an exclusive backup and are not 
> applicable
> + to the base backup.
> +
> This is a bit confusing. When using pg_basebackup the files are
> ignored, right, but they are included in the tar stream so they are
> not excluded at the end. So we had better remove purely this
> paragraph. Similarly, postgresql.auto.conf.tmp is something that
> exists only for a short time frame. Not listing it directly is fine
> IMO.

OK, I agree that's confusing and probably not helpful to the user.

> +   /* If symlink, write it as a directory anyway */
> +#ifndef WIN32
> +   if (S_ISLNK(statbuf->st_mode))
> +#else
> +   if (pgwin32_is_junction(pathbuf))
> +#endif
> +
> +   statbuf->st_mode = S_IFDIR | S_IRWXU;
> Indentation here is confusing. Coverity would complain.

OK.

> +   /* Stat the file */
> +   snprintf(pathbuf, MAXPGPATH, "%s/%s", path, de->d_name);
> There is no need to put the stat() call this early in the loop as this
> is just used for re-creating empty directories.

OK.

> +if (strcmp(pathbuf + basepathlen + 1,
> +   excludeFiles[excludeIdx]) == 0)
> There is no need for complicated maths, you can just use de->d_name.

OK.

> pg_xlog has somewhat a similar treatment, but per the exception
> handling of archive_status it is better to leave its check as-is.

OK.

> The DEBUG1 entries are really useful for debugging, it would be nice
> to keep them in the final patch.

That was my thinking as well.  It's up to Peter, though.

> Thinking harder, your logic can be simplified. You could just do the 
> following:
> - Check for interrupts first
> - Look at the list of excluded files
> - Call lstat()
> - Check for excluded directories

I think setting excludeFound = false the second time is redundant since
it must be false at that point.  Am I missing something or are you just
being cautious in case code changes above?

> After all that fixed, I have moved the patch to "Ready for Committer".
> Please use the updated patch though.

The v6 patch looks good to me.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-28 Thread Michael Paquier
On Tue, Sep 27, 2016 at 11:27 PM, David Steele  wrote:
> On 9/26/16 2:36 AM, Michael Paquier wrote:
>
>> Just a nit:
>>  
>> - postmaster.pid
>> + postmaster.pid and postmaster.opts
>>  
>> Having one  block for each file would be better.
>
> OK, changed.

You did not actually change it :)

>> +const char *excludeFile[] =
>> excludeFiles[]?
>>
>> +# Move pg_replslot out of $pgdata and create a symlink to it
>> +rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
>> +   or die "unable to move $pgdata/pg_replslot";
>> +symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot");
>> This will blow up on Windows. Those tests need to be included in the
>> SKIP block. Even if your code needs to support junction points on
>> Windows, as perl does not offer an equivalent for it we just cannot
>> test it on this platform.
>
> Fixed.

Thanks for the updated version.

+
+ backup_label and tablespace_map.  If these
+ files exist they belong to an exclusive backup and are not applicable
+ to the base backup.
+
This is a bit confusing. When using pg_basebackup the files are
ignored, right, but they are included in the tar stream so they are
not excluded at the end. So we had better remove purely this
paragraph. Similarly, postgresql.auto.conf.tmp is something that
exists only for a short time frame. Not listing it directly is fine
IMO.

+   /* If symlink, write it as a directory anyway */
+#ifndef WIN32
+   if (S_ISLNK(statbuf->st_mode))
+#else
+   if (pgwin32_is_junction(pathbuf))
+#endif
+
+   statbuf->st_mode = S_IFDIR | S_IRWXU;
Indentation here is confusing. Coverity would complain.

+   /* Stat the file */
+   snprintf(pathbuf, MAXPGPATH, "%s/%s", path, de->d_name);
There is no need to put the stat() call this early in the loop as this
is just used for re-creating empty directories.

+if (strcmp(pathbuf + basepathlen + 1,
+   excludeFiles[excludeIdx]) == 0)
There is no need for complicated maths, you can just use de->d_name.

pg_xlog has somewhat a similar treatment, but per the exception
handling of archive_status it is better to leave its check as-is.

The DEBUG1 entries are really useful for debugging, it would be nice
to keep them in the final patch.

Thinking harder, your logic can be simplified. You could just do the following:
- Check for interrupts first
- Look at the list of excluded files
- Call lstat()
- Check for excluded directories

After all that fixed, I have moved the patch to "Ready for Committer".
Please use the updated patch though.
-- 
Michael
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 0f09d82..a8daa07 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1090,6 +1090,22 @@ SELECT pg_stop_backup();

 

+The contents of the pg_dynshmem/, pg_stat_tmp/,
+pg_notify/, pg_serial/,
+pg_snapshots/, and pg_subtrans/ directories can
+be omitted from the backup as they will be initialized on postmaster
+startup. If the  is set and is
+under the database cluster directory then the contents of the directory
+specified by  can also be omitted.
+   
+
+   
+Any file or directory beginning with pgsql_tmp can be
+omitted from the backup.  These files are removed on postmaster start and
+the directories will be recreated as needed.
+   
+
+   
 The backup label
 file includes the label string you gave to pg_start_backup,
 as well as the time at which pg_start_backup was run, and
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 68b0941..fe4d511 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2069,7 +2069,9 @@ The commands accepted in walsender mode are:


 
- various temporary files created during the operation of the PostgreSQL server
+ Various temporary files and directories created during the operation of
+ the PostgreSQL server, i.e. any file or directory beginning with
+ pgsql_tmp.
 


@@ -2082,7 +2084,11 @@ The commands accepted in walsender mode are:


 
- pg_replslot is copied as an empty directory.
+ pg_replslot, pg_dynshmem,
+ pg_stat_tmp, pg_notify,
+ pg_serial, pg_snapshots, and
+ pg_subtrans are copied as empty directories (even if they
+ are symbolic links).
 


diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9f1eae1..984ea5b 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -610,10 +610,8 @@ PostgreSQL documentation
   
The backup will include all files in the data directory and tablespaces,
including the configuration files and any additional files placed in the
-   directory by third parties. But only regular files and directories are
-   

Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-27 Thread David Steele
On 9/26/16 2:36 AM, Michael Paquier wrote:

> Just a nit:
>  
> - postmaster.pid
> + postmaster.pid and postmaster.opts
>  
> Having one  block for each file would be better.

OK, changed.

> +const char *excludeFile[] =
> excludeFiles[]?
> 
> +# Move pg_replslot out of $pgdata and create a symlink to it
> +rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
> +   or die "unable to move $pgdata/pg_replslot";
> +symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot");
> This will blow up on Windows. Those tests need to be included in the
> SKIP block. Even if your code needs to support junction points on
> Windows, as perl does not offer an equivalent for it we just cannot
> test it on this platform.

Fixed.

-- 
-David
da...@pgmasters.net
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 0f09d82..a8daa07 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1090,6 +1090,22 @@ SELECT pg_stop_backup();

 

+The contents of the pg_dynshmem/, pg_stat_tmp/,
+pg_notify/, pg_serial/,
+pg_snapshots/, and pg_subtrans/ directories can
+be omitted from the backup as they will be initialized on postmaster
+startup. If the  is set and is
+under the database cluster directory then the contents of the directory
+specified by  can also be omitted.
+   
+
+   
+Any file or directory beginning with pgsql_tmp can be
+omitted from the backup.  These files are removed on postmaster start and
+the directories will be recreated as needed.
+   
+
+   
 The backup label
 file includes the label string you gave to pg_start_backup,
 as well as the time at which pg_start_backup was run, and
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 68b0941..d65687f 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2059,17 +2059,26 @@ The commands accepted in walsender mode are:
   

 
- postmaster.pid
+ postmaster.pid and postmaster.opts
 


 
- postmaster.opts
+ postgresql.auto.conf.tmp
 


 
- various temporary files created during the operation of the 
PostgreSQL server
+ backup_label and tablespace_map.  If these
+ files exist they belong to an exclusive backup and are not applicable
+ to the base backup.
+
+   
+   
+
+ Various temporary files and directories created during the operation 
of
+ the PostgreSQL server, i.e. any file or directory beginning with
+ pgsql_tmp.
 


@@ -2082,7 +2091,11 @@ The commands accepted in walsender mode are:


 
- pg_replslot is copied as an empty directory.
+ pg_replslot, pg_dynshmem,
+ pg_stat_tmp, pg_notify,
+ pg_serial, pg_snapshots, and
+ pg_subtrans are copied as empty directories (even if they
+ are symbolic links).
 


diff --git a/doc/src/sgml/ref/pg_basebackup.sgml 
b/doc/src/sgml/ref/pg_basebackup.sgml
index 9f1eae1..984ea5b 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -610,10 +610,8 @@ PostgreSQL documentation
   
The backup will include all files in the data directory and tablespaces,
including the configuration files and any additional files placed in the
-   directory by third parties. But only regular files and directories are
-   copied.  Symbolic links (other than those used for tablespaces) and special
-   device files are skipped.  (See  for
-   the precise details.)
+   directory by third parties, with certain exceptions.  (See
+for the complete list of exceptions.)
   
 
   
diff --git a/src/backend/replication/basebackup.c 
b/src/backend/replication/basebackup.c
index da9b7a6..8412472 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -30,6 +30,7 @@
 #include "replication/basebackup.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
+#include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "utils/builtins.h"
@@ -55,8 +56,10 @@ static int64 sendDir(char *path, int basepathlen, bool 
sizeonly,
 static bool sendFile(char *readfilename, char *tarfilename,
 struct stat * statbuf, bool missing_ok);
 static void sendFileWithContent(const char *filename, const char *content);
-static void _tarWriteHeader(const char *filename, const char *linktarget,
-   struct stat * statbuf);
+static int64 _tarWriteHeader(const char *filename, const char *linktarget,
+   struct stat * statbuf, bool sizeonly);
+static int64 _tarWriteDir(char *pathbuf, int basepathlen, struct stat *statbuf,
+   bool sizeonly);
 static void 

Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-27 Thread David Steele
On 9/26/16 2:36 AM, Michael Paquier wrote:
> 
> Just a nit:
>  
> - postmaster.pid
> + postmaster.pid and postmaster.opts
>  
> Having one  block for each file would be better.

I grouped these because they are related and because there are now other
bullets where multiple files/dirs are listed.

Are you proposing to also breakup pg_replslot,
pg_dynshmem, pg_stat_tmp,
pg_notify, pg_serial,
pg_snapshots, and pg_subtrans?

Also backup_label and tablespace_map?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-26 Thread Michael Paquier
On Sat, Sep 24, 2016 at 3:26 AM, David Steele  wrote:
> On 9/12/16 2:50 PM, Peter Eisentraut wrote:
>> The comments on excludeDirContents are somewhat imprecise.  For
>> example, none of those directories are actually removed or recreated
>> on startup, only the contents are.
>
> Fixed.
>
>> The comment for pg_replslot is incorrect.  I think you can copy
>> replication slots just fine, but you usually don't want to.
>
> Fixed.
>
>> What is the 2 for in this code?
>>
>> if (strcmp(pathbuf + 2, excludeFile[excludeIdx]) == 0)
>
> This should be basepathlen + 1 (i.e., $PGDATA/).  Fixed.
>
>> I would keep the signature of _tarWriteDir() similar to
>> _tarWriteHeader().  So don't pass sizeonly in there, or do pass in
>> sizeonly but do the same with _tarWriteHeader().
>
> I'm not sure how much more similar they can be, but I do agree that
> sizeonly logic should be wrapped in _tarWriteHeader().  Done.
>
>> The documentation in pg_basebackup.sgml and protocol.sgml should be
>> updated.
>
> Done.  I moved a few things to the protocol docs to avoid repetition.
>
>> Add some tests.  At least test that one entry from the directory list
>> and one entry from the files list is not contained in the backup
>> result.
>
> Done for all files and directories.
>
>> Testing the symlink handling would also be good.
>
> Done using pg_replslot for the test.
>
>> (The
>> pg_basebackup tests claim that Windows doesn't support symlinks and
>> therefore skip all the symlink tests.  That seems a bit at odds with
>> your code handling symlinks on Windows.)
>
> Hopefully Michael's response down-thread answered your question.

Just a nit:
 
- postmaster.pid
+ postmaster.pid and postmaster.opts
 
Having one  block for each file would be better.

+const char *excludeFile[] =
excludeFiles[]?

+# Move pg_replslot out of $pgdata and create a symlink to it
+rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
+   or die "unable to move $pgdata/pg_replslot";
+symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot");
This will blow up on Windows. Those tests need to be included in the
SKIP block. Even if your code needs to support junction points on
Windows, as perl does not offer an equivalent for it we just cannot
test it on this platform.

Except that, the patch looks pretty good to me.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-23 Thread David Steele
On 9/12/16 2:50 PM, Peter Eisentraut wrote:
> The comments on excludeDirContents are somewhat imprecise.  For
> example, none of those directories are actually removed or recreated
> on startup, only the contents are.

Fixed.

> The comment for pg_replslot is incorrect.  I think you can copy
> replication slots just fine, but you usually don't want to.

Fixed.

> What is the 2 for in this code?
> 
> if (strcmp(pathbuf + 2, excludeFile[excludeIdx]) == 0)

This should be basepathlen + 1 (i.e., $PGDATA/).  Fixed.

> I would keep the signature of _tarWriteDir() similar to
> _tarWriteHeader().  So don't pass sizeonly in there, or do pass in
> sizeonly but do the same with _tarWriteHeader().

I'm not sure how much more similar they can be, but I do agree that
sizeonly logic should be wrapped in _tarWriteHeader().  Done.

> The documentation in pg_basebackup.sgml and protocol.sgml should be
> updated.

Done.  I moved a few things to the protocol docs to avoid repetition.

> Add some tests.  At least test that one entry from the directory list
> and one entry from the files list is not contained in the backup
> result.

Done for all files and directories.

> Testing the symlink handling would also be good.

Done using pg_replslot for the test.

> (The
> pg_basebackup tests claim that Windows doesn't support symlinks and
> therefore skip all the symlink tests.  That seems a bit at odds with
> your code handling symlinks on Windows.)

Hopefully Michael's response down-thread answered your question.

-- 
-David
da...@pgmasters.net
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 0f09d82..a8daa07 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1090,6 +1090,22 @@ SELECT pg_stop_backup();

 

+The contents of the pg_dynshmem/, pg_stat_tmp/,
+pg_notify/, pg_serial/,
+pg_snapshots/, and pg_subtrans/ directories can
+be omitted from the backup as they will be initialized on postmaster
+startup. If the  is set and is
+under the database cluster directory then the contents of the directory
+specified by  can also be omitted.
+   
+
+   
+Any file or directory beginning with pgsql_tmp can be
+omitted from the backup.  These files are removed on postmaster start and
+the directories will be recreated as needed.
+   
+
+   
 The backup label
 file includes the label string you gave to pg_start_backup,
 as well as the time at which pg_start_backup was run, and
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 68b0941..d65687f 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2059,17 +2059,26 @@ The commands accepted in walsender mode are:
   

 
- postmaster.pid
+ postmaster.pid and postmaster.opts
 


 
- postmaster.opts
+ postgresql.auto.conf.tmp
 


 
- various temporary files created during the operation of the 
PostgreSQL server
+ backup_label and tablespace_map.  If these
+ files exist they belong to an exclusive backup and are not applicable
+ to the base backup.
+
+   
+   
+
+ Various temporary files and directories created during the operation 
of
+ the PostgreSQL server, i.e. any file or directory beginning with
+ pgsql_tmp.
 


@@ -2082,7 +2091,11 @@ The commands accepted in walsender mode are:


 
- pg_replslot is copied as an empty directory.
+ pg_replslot, pg_dynshmem,
+ pg_stat_tmp, pg_notify,
+ pg_serial, pg_snapshots, and
+ pg_subtrans are copied as empty directories (even if they
+ are symbolic links).
 


diff --git a/doc/src/sgml/ref/pg_basebackup.sgml 
b/doc/src/sgml/ref/pg_basebackup.sgml
index 9f1eae1..984ea5b 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -610,10 +610,8 @@ PostgreSQL documentation
   
The backup will include all files in the data directory and tablespaces,
including the configuration files and any additional files placed in the
-   directory by third parties. But only regular files and directories are
-   copied.  Symbolic links (other than those used for tablespaces) and special
-   device files are skipped.  (See  for
-   the precise details.)
+   directory by third parties, with certain exceptions.  (See
+for the complete list of exceptions.)
   
 
   
diff --git a/src/backend/replication/basebackup.c 
b/src/backend/replication/basebackup.c
index da9b7a6..04909ef 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -30,6 +30,7 @@
 #include "replication/basebackup.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
+#include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include 

Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-12 Thread Michael Paquier
On Tue, Sep 13, 2016 at 3:50 AM, Peter Eisentraut
 wrote:
> Add some tests.  At least test that one entry from the directory list
> and one entry from the files list is not contained in the backup
> result.  Testing the symlink handling would also be good.  (The
> pg_basebackup tests claim that Windows doesn't support symlinks and
> therefore skip all the symlink tests.  That seems a bit at odds with
> your code handling symlinks on Windows.)

The code proposed needs to support junction points on Windows so from
this side of things everything is fine. What is lacking here is
support for symlink() in perl for Windows, and that's why the tests
are skipped.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-12 Thread Peter Eisentraut
The comments on excludeDirContents are somewhat imprecise.  For
example, none of those directories are actually removed or recreated
on startup, only the contents are.

The comment for pg_replslot is incorrect.  I think you can copy
replication slots just fine, but you usually don't want to.

What is the 2 for in this code?

if (strcmp(pathbuf + 2, excludeFile[excludeIdx]) == 0)

I would keep the signature of _tarWriteDir() similar to
_tarWriteHeader().  So don't pass sizeonly in there, or do pass in
sizeonly but do the same with _tarWriteHeader().

The documentation in pg_basebackup.sgml and protocol.sgml should be
updated.

Add some tests.  At least test that one entry from the directory list
and one entry from the files list is not contained in the backup
result.  Testing the symlink handling would also be good.  (The
pg_basebackup tests claim that Windows doesn't support symlinks and
therefore skip all the symlink tests.  That seems a bit at odds with
your code handling symlinks on Windows.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-10 Thread Michael Paquier
On Fri, Sep 9, 2016 at 10:18 PM, David Steele  wrote:
> On 9/6/16 10:25 PM, Michael Paquier wrote:
>>
>>
>> On Wed, Sep 7, 2016 at 12:16 AM, David Steele  wrote:
>>
>>> Attached is a new patch that adds sgml documentation.  I can expand on
>>> each
>>> directory individually if you think that's necessary, but thought it was
>>> better to lump them into a few categories.
>>
>>
>> +be ommitted from the backup as they will be initialized on postmaster
>> +startup. If the  is set and
>> is
>> +under the database cluster directory then the contents of the
>> directory
>> +specified by  can also
>> be ommitted.
>>
>> s/ommitted/omitted/
>
>
> Fixed.
>
>> +#define EXCLUDE_DIR_MAX 8
>> +#define EXCLUDE_DIR_STAT_TMP 0
>> +
>> +const char *excludeDirContents[EXCLUDE_DIR_MAX] =
>> +{
>> +/*
>> + * Skip temporary statistics files. The first array position will be
>> + * filled with the value of pgstat_stat_directory relative to PGDATA.
>> + * PG_STAT_TMP_DIR must be skipped even when stats_temp_directory is
>> set
>> + * because PGSS_TEXT_FILE is always created there.
>> + */
>> +NULL,
>> I find that ugly. I'd rather use an array with undefined size for the
>> fixed elements finishing by NULL, remove EXCLUDE_DIR_MAX and
>> EXCLUDE_DIR_STAT_TMP and use a small routine to do the work done on
>> _tarWriteHeader...
>
>
> Done.  Also writing out pg_xlog with the new routine.

Thanks, this looks in far better shape now.

+   /* Removed on startup, see DeleteAllExportedSnapshotFiles(). */
+   "pg_snapshots",
Using SNAPSHOT_EXPORT_DIR is possible here.

+_tarWriteDir(char *pathbuf, int basepathlen, struct stat *statbuf,
+bool sizeonly)
That's nice, thanks! This even takes care of the fact when directories
other than pg_xlog are symlinks or junction points.

+   /* Recreated on startup, see StartupReplicationSlots(). */
+   "pg_replslot",
This comment is incorrect, pg_replslot is skipped in a base backup
because that's just useless.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-09 Thread David Steele

On 9/6/16 10:25 PM, Michael Paquier wrote:
>

On Wed, Sep 7, 2016 at 12:16 AM, David Steele  wrote:


Attached is a new patch that adds sgml documentation.  I can expand on each
directory individually if you think that's necessary, but thought it was
better to lump them into a few categories.


+be ommitted from the backup as they will be initialized on postmaster
+startup. If the  is set and is
+under the database cluster directory then the contents of the directory
+specified by  can also
be ommitted.

s/ommitted/omitted/


Fixed.


+#define EXCLUDE_DIR_MAX 8
+#define EXCLUDE_DIR_STAT_TMP 0
+
+const char *excludeDirContents[EXCLUDE_DIR_MAX] =
+{
+/*
+ * Skip temporary statistics files. The first array position will be
+ * filled with the value of pgstat_stat_directory relative to PGDATA.
+ * PG_STAT_TMP_DIR must be skipped even when stats_temp_directory is set
+ * because PGSS_TEXT_FILE is always created there.
+ */
+NULL,
I find that ugly. I'd rather use an array with undefined size for the
fixed elements finishing by NULL, remove EXCLUDE_DIR_MAX and
EXCLUDE_DIR_STAT_TMP and use a small routine to do the work done on
_tarWriteHeader...


Done.  Also writing out pg_xlog with the new routine.

--
-David
da...@pgmasters.net
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 0f09d82..a8daa07 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1090,6 +1090,22 @@ SELECT pg_stop_backup();

 

+The contents of the pg_dynshmem/, pg_stat_tmp/,
+pg_notify/, pg_serial/,
+pg_snapshots/, and pg_subtrans/ directories can
+be omitted from the backup as they will be initialized on postmaster
+startup. If the  is set and is
+under the database cluster directory then the contents of the directory
+specified by  can also be omitted.
+   
+
+   
+Any file or directory beginning with pgsql_tmp can be
+omitted from the backup.  These files are removed on postmaster start and
+the directories will be recreated as needed.
+   
+
+   
 The backup label
 file includes the label string you gave to pg_start_backup,
 as well as the time at which pg_start_backup was run, and
diff --git a/src/backend/replication/basebackup.c 
b/src/backend/replication/basebackup.c
index da9b7a6..a441ae2 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -30,6 +30,7 @@
 #include "replication/basebackup.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
+#include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "utils/builtins.h"
@@ -57,6 +58,8 @@ static bool sendFile(char *readfilename, char *tarfilename,
 static void sendFileWithContent(const char *filename, const char *content);
 static void _tarWriteHeader(const char *filename, const char *linktarget,
struct stat * statbuf);
+static int64 _tarWriteDir(char *pathbuf, int basepathlen, struct stat *statbuf,
+   bool sizeonly);
 static void send_int8_string(StringInfoData *buf, int64 intval);
 static void SendBackupHeader(List *tablespaces);
 static void base_backup_cleanup(int code, Datum arg);
@@ -95,6 +98,67 @@ static int64 elapsed_min_unit;
 static int64 throttled_last;
 
 /*
+ * The contents of these directories are removed or recreated during server
+ * start so they will not be included in the backup.  The directory entry
+ * will be included to preserve permissions.
+ */
+const char *excludeDirContents[] =
+{
+   /*
+* Skip temporary statistics files. PG_STAT_TMP_DIR must be skipped even
+* when stats_temp_directory is set because PGSS_TEXT_FILE is always 
created
+* there.
+*/
+   PG_STAT_TMP_DIR,
+
+   /* Recreated on startup, see StartupReplicationSlots(). */
+   "pg_replslot",
+
+   /* Removed on startup, see dsm_cleanup_for_mmap(). */
+   PG_DYNSHMEM_DIR,
+
+   /* Recreated/zeroed on startup, see AsyncShmemInit(). */
+   "pg_notify",
+
+   /* Recreated on startup, see OldSerXidInit(). */
+   "pg_serial",
+
+   /* Removed on startup, see DeleteAllExportedSnapshotFiles(). */
+   "pg_snapshots",
+
+   /* Recreated/zeroed on startup, see StartupSUBTRANS(). */
+   "pg_subtrans",
+
+   /* Terminate list. */
+   NULL
+};
+
+/*
+ * Files that should not be included in the backup.
+ */
+const char *excludeFile[] =
+{
+   /* Skip auto conf temporary file. */
+   PG_AUTOCONF_FILENAME ".tmp",
+
+   /*
+* If there's a backup_label or tablespace_map file, it belongs to a 
backup
+* started by the user with pg_start_backup(). It is *not* correct for 
this
+* backup, our backup_label/tablespace_map is injected into the tar
+* separately.
+*/
+   BACKUP_LABEL_FILE,
+   TABLESPACE_MAP,
+
+   /* Skip 

Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-07 Thread David Steele
On 9/7/16 8:21 AM, David Steele wrote:
> On 9/6/16 10:25 PM, Michael Paquier wrote:
>> I find that ugly. I'd rather use an array with undefined size for the
>> fixed elements finishing by NULL, remove EXCLUDE_DIR_MAX and
>> EXCLUDE_DIR_STAT_TMP and use a small routine to do the work done on
>> _tarWriteHeader...
> 
> My goal was to be able to fully reuse the code that creates the paths,
> but this could also be done by following your suggestion and also moving
> the path code into a function.

I just realized all I did was repeat what you said...

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-07 Thread David Steele
On 9/6/16 10:25 PM, Michael Paquier wrote:
> On Wed, Sep 7, 2016 at 12:16 AM, David Steele  wrote:
>> Attached is a new patch that adds sgml documentation.  I can expand on each
>> directory individually if you think that's necessary, but thought it was
>> better to lump them into a few categories.
> 
> +be ommitted from the backup as they will be initialized on postmaster
> +startup. If the  is set and is
> +under the database cluster directory then the contents of the directory
> +specified by  can also
> be ommitted.
> 
> s/ommitted/omitted/

Thanks!

> +#define EXCLUDE_DIR_MAX 8
> +#define EXCLUDE_DIR_STAT_TMP 0
> +
> +const char *excludeDirContents[EXCLUDE_DIR_MAX] =
> +{
> +/*
> + * Skip temporary statistics files. The first array position will be
> + * filled with the value of pgstat_stat_directory relative to PGDATA.
> + * PG_STAT_TMP_DIR must be skipped even when stats_temp_directory is set
> + * because PGSS_TEXT_FILE is always created there.
> + */
> +NULL,
> I find that ugly. I'd rather use an array with undefined size for the
> fixed elements finishing by NULL, remove EXCLUDE_DIR_MAX and
> EXCLUDE_DIR_STAT_TMP and use a small routine to do the work done on
> _tarWriteHeader...

My goal was to be able to fully reuse the code that creates the paths,
but this could also be done by following your suggestion and also moving
the path code into a function.

Any opinion on this, Peter?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-06 Thread Michael Paquier
On Wed, Sep 7, 2016 at 12:16 AM, David Steele  wrote:
> On 9/1/16 9:53 AM, Peter Eisentraut wrote:
>>
>> On 8/15/16 3:39 PM, David Steele wrote:
>>>
>>> That patch got me thinking about what else could be excluded and after
>>> some investigation I found the following: pg_notify, pg_serial,
>>> pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
>>> or rebuilt on server start.
>>>
>>> The attached patch adds these directories to the pg_basebackup
>>> exclusions and takes an array-based approach to excluding directories
>>> and files during backup.
>>
>>
>> We do support other backup methods besides using pg_basebackup.  So I
>> think we need to document the required or recommended handling of each
>> of these directories.  And then pg_basebackup should become a consumer
>> of that documentation.
>>
>> The current documentation on this is at
>>
>> ,
>> which covers pg_xlog and pg_replslot.  I think that documentation should
>> be expanded, maybe with a simple list that is easy to copy into an
>> exclude file, following by more detail on each directory.
>
>
> Attached is a new patch that adds sgml documentation.  I can expand on each
> directory individually if you think that's necessary, but thought it was
> better to lump them into a few categories.

+be ommitted from the backup as they will be initialized on postmaster
+startup. If the  is set and is
+under the database cluster directory then the contents of the directory
+specified by  can also
be ommitted.

s/ommitted/omitted/


+#define EXCLUDE_DIR_MAX 8
+#define EXCLUDE_DIR_STAT_TMP 0
+
+const char *excludeDirContents[EXCLUDE_DIR_MAX] =
+{
+/*
+ * Skip temporary statistics files. The first array position will be
+ * filled with the value of pgstat_stat_directory relative to PGDATA.
+ * PG_STAT_TMP_DIR must be skipped even when stats_temp_directory is set
+ * because PGSS_TEXT_FILE is always created there.
+ */
+NULL,
I find that ugly. I'd rather use an array with undefined size for the
fixed elements finishing by NULL, remove EXCLUDE_DIR_MAX and
EXCLUDE_DIR_STAT_TMP and use a small routine to do the work done on
_tarWriteHeader...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-06 Thread David Steele

On 9/1/16 9:53 AM, Peter Eisentraut wrote:

On 8/15/16 3:39 PM, David Steele wrote:

That patch got me thinking about what else could be excluded and after
some investigation I found the following: pg_notify, pg_serial,
pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
or rebuilt on server start.

The attached patch adds these directories to the pg_basebackup
exclusions and takes an array-based approach to excluding directories
and files during backup.


We do support other backup methods besides using pg_basebackup.  So I
think we need to document the required or recommended handling of each
of these directories.  And then pg_basebackup should become a consumer
of that documentation.

The current documentation on this is at
,
which covers pg_xlog and pg_replslot.  I think that documentation should
be expanded, maybe with a simple list that is easy to copy into an
exclude file, following by more detail on each directory.


Attached is a new patch that adds sgml documentation.  I can expand on 
each directory individually if you think that's necessary, but thought 
it was better to lump them into a few categories.


--
-David
da...@pgmasters.net
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 0f09d82..b0f9556 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1090,6 +1090,22 @@ SELECT pg_stop_backup();

 

+The contents of the pg_dynshmem/, pg_stat_tmp/,
+pg_notify/, pg_serial/,
+pg_snapshots/, and pg_subtrans/ directories can
+be ommitted from the backup as they will be initialized on postmaster
+startup. If the  is set and is
+under the database cluster directory then the contents of the directory
+specified by  can also be 
ommitted.
+   
+
+   
+Any file or directory beginning with pgsql_tmp can be
+omitted from the backup.  These files are removed on postmaster start and
+the directories will be recreated as needed.
+   
+
+   
 The backup label
 file includes the label string you gave to pg_start_backup,
 as well as the time at which pg_start_backup was run, and
diff --git a/src/backend/replication/basebackup.c 
b/src/backend/replication/basebackup.c
index da9b7a6..830d332 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -30,6 +30,7 @@
 #include "replication/basebackup.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
+#include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "utils/builtins.h"
@@ -69,9 +70,6 @@ static void throttle(size_t increment);
 /* Was the backup currently in-progress initiated in recovery mode? */
 static bool backup_started_in_recovery = false;
 
-/* Relative path of temporary statistics directory */
-static char *statrelpath = NULL;
-
 /*
  * Size of each block sent into the tar stream for larger files.
  */
@@ -95,6 +93,68 @@ static int64 elapsed_min_unit;
 static int64 throttled_last;
 
 /*
+ * The contents of these directories are removed or recreated during server
+ * start so they will not be included in the backup.  The directory entry
+ * will be included to preserve permissions.
+ */
+#define EXCLUDE_DIR_MAX8
+#define EXCLUDE_DIR_STAT_TMP   0
+
+const char *excludeDirContents[EXCLUDE_DIR_MAX] =
+{
+   /*
+* Skip temporary statistics files. The first array position will be
+* filled with the value of pgstat_stat_directory relative to PGDATA.
+* PG_STAT_TMP_DIR must be skipped even when stats_temp_directory is set
+* because PGSS_TEXT_FILE is always created there.
+*/
+   NULL,
+   PG_STAT_TMP_DIR,
+
+   /* Recreated on startup, see StartupReplicationSlots(). */
+   "pg_replslot",
+
+   /* Removed on startup, see dsm_cleanup_for_mmap(). */
+   PG_DYNSHMEM_DIR,
+
+   /* Recreated/zeroed on startup, see AsyncShmemInit(). */
+   "pg_notify",
+
+   /* Recreated on startup, see OldSerXidInit(). */
+   "pg_serial",
+
+   /* Removed on startup, see DeleteAllExportedSnapshotFiles(). */
+   "pg_snapshots",
+
+   /* Recreated/zeroed on startup, see StartupSUBTRANS(). */
+   "pg_subtrans"
+};
+
+/*
+ * Files that should not be included in the backup.
+ */
+#define EXCLUDE_FILE_MAX   5
+
+const char *excludeFile[EXCLUDE_FILE_MAX] =
+{
+   /* Skip auto conf temporary file. */
+   PG_AUTOCONF_FILENAME ".tmp",
+
+   /*
+* If there's a backup_label or tablespace_map file, it belongs to a 
backup
+* started by the user with pg_start_backup(). It is *not* correct for 
this
+* backup, our backup_label/tablespace_map is injected into the tar
+* separately.
+*/
+   BACKUP_LABEL_FILE,
+   TABLESPACE_MAP,
+
+   /* Skip postmaster.pid and postmaster.opts. */
+   

Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-01 Thread Peter Eisentraut
On 8/15/16 3:39 PM, David Steele wrote:
> That patch got me thinking about what else could be excluded and after
> some investigation I found the following: pg_notify, pg_serial,
> pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
> or rebuilt on server start.
> 
> The attached patch adds these directories to the pg_basebackup
> exclusions and takes an array-based approach to excluding directories
> and files during backup.

We do support other backup methods besides using pg_basebackup.  So I
think we need to document the required or recommended handling of each
of these directories.  And then pg_basebackup should become a consumer
of that documentation.

The current documentation on this is at
,
which covers pg_xlog and pg_replslot.  I think that documentation should
be expanded, maybe with a simple list that is easy to copy into an
exclude file, following by more detail on each directory.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-08-18 Thread David Steele
On 8/17/16 7:56 PM, Michael Paquier wrote:
> On Thu, Aug 18, 2016 at 1:35 AM, Alvaro Herrera
>  wrote:
>> I don't remember how pg_snapshot works, but it's probably fine
>> to start with an empty subdir (is it possible to export a snapshot from
>> a prepared transaction?)
> 
> From xact.c:
> /*
>  * Likewise, don't allow PREPARE after pg_export_snapshot.  This could be
>  * supported if we added cleanup logic to twophase.c, but for now it
>  * doesn't seem worth the trouble.
>  */
> if (XactHasExportedSnapshots())
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("cannot PREPARE a transaction that has exported snapshots")));
> So that's fine.

Thank you for tracking that down, Michael.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-08-17 Thread Michael Paquier
On Thu, Aug 18, 2016 at 1:35 AM, Alvaro Herrera
 wrote:
> I don't remember how pg_snapshot works, but it's probably fine
> to start with an empty subdir (is it possible to export a snapshot from
> a prepared transaction?)

>From xact.c:
/*
 * Likewise, don't allow PREPARE after pg_export_snapshot.  This could be
 * supported if we added cleanup logic to twophase.c, but for now it
 * doesn't seem worth the trouble.
 */
if (XactHasExportedSnapshots())
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot PREPARE a transaction that has exported snapshots")));
So that's fine.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-08-17 Thread David Steele
On 8/17/16 2:53 PM, Robert Haas wrote:
> On Wed, Aug 17, 2016 at 2:50 PM, David Steele  wrote:
>> Hi Robert,
>>
>> On 8/17/16 11:27 AM, Robert Haas wrote:
>>> On Mon, Aug 15, 2016 at 3:39 PM, David Steele  wrote:
 Recently a hacker proposed a patch to add pg_dynshmem to the list of
 directories whose contents are excluded in pg_basebackup.  I wasn't able
 to find the original email despite several attempts.

 That patch got me thinking about what else could be excluded and after
 some investigation I found the following: pg_notify, pg_serial,
 pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
 or rebuilt on server start.
>>>
>>> Eh ... I doubt very much that it's safe to blow away the entire
>>> contents of an SLRU between shutdown and startup, even if the data is
>>> technically transient data that won't be needed again after the system
>>> is reset.
>>
>> If you are correct it may indicate a problem anyway. Consider a standby
>> backup where the files in these directories may be incredibly stale
>> since they are not replicated.  Once restored to a master should we
>> trust anything in these files?
>>
>> pg_serial, pg_notify, pg_subtrans are not even fsync'd
>> (SlruCtl->do_fsync = false).  It's hard to imagine there's anything of
>> value in there or that it can be trusted if there is.
> 
> It's not just a question of whether the data has value; it's a
> question of whether the SLRU code will handle the situation correctly
> in all cases if the directory contains no files.  I don't think you
> can draw a firm conclusion on that without reading the code.

A good point, Robert.  I read the code but I should have shared my
findings in the original post. I'll only cover the the cases we have not
already decided were safe (those that explicitly delete files in their
init routines, pg_snapshots and pg_dynshmem).

First, SlruPhysicalWritePage() will create a file/page that does not
exist and SlruPhysicalReadPage() will return zeros for a file/page that
does not exist during recovery.  Not that I think the latter is
important for pg_serial, pg_notify, or pg_subtrans since they are not
WAL-logged.  As far as I can see the slru system is very resilient overall.

For pg_notify, AsyncShmemInit() explicitly deletes all files on startup.

For pg_subtrans, StartupSUBTRANS() explicitly zeroes all required pages
but does not flush them to disk.  Since SlruPhysicalWritePage() is
tolerant of missing files/pages this should be fine.

For pg_serial, OldSerXidInit() doesn't zero anything out since it
ignores the old transactions and they get truncated as the transaction
horizon moves.  The old data hangs around for a little while so it could
theoretically be used for debugging as Kevin notes.  Since pg_serial
would only be cleared after a restore I don't see that as a big issue.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-08-17 Thread Kevin Grittner
On Wed, Aug 17, 2016 at 1:50 PM, David Steele  wrote:

>>> That patch got me thinking about what else could be excluded and after
>>> some investigation I found the following: pg_notify, pg_serial,
>>> pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
>>> or rebuilt on server start.
>>
>> Eh ... I doubt very much that it's safe to blow away the entire
>> contents of an SLRU between shutdown and startup, even if the data is
>> technically transient data that won't be needed again after the system
>> is reset.
>
> I've done pretty extensive testing in pgBackRest and haven't seen issues
> in any supported version (plus I audited each init() function for every
> version back to where it was introduced).  The patch also passes all the
> pg_basebackup TAP tests in master.
>
> If you are correct it may indicate a problem anyway. Consider a standby
> backup where the files in these directories may be incredibly stale
> since they are not replicated.  Once restored to a master should we
> trust anything in these files?
>
> pg_serial, pg_notify, pg_subtrans are not even fsync'd
> (SlruCtl->do_fsync = false).  It's hard to imagine there's anything of
> value in there or that it can be trusted if there is.

The contents of pg_serial could be deleted safely.  As I recall, I
initially had it cleaned out on startup, but Heikki (who was the
main committer for this feature) added SLRU buffer flushing for it
on checkpoint and took out the startup delete code with the
explanation that he thought someone might want to look at the file
contents for debugging purposes.  I would be a bit surprised if
anyone ever has used it for that, but that is the reason the
contents are not deleted just like pg_snapshot and pg_dynshmem.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-08-17 Thread Robert Haas
On Wed, Aug 17, 2016 at 2:50 PM, David Steele  wrote:
> Hi Robert,
>
> On 8/17/16 11:27 AM, Robert Haas wrote:
>> On Mon, Aug 15, 2016 at 3:39 PM, David Steele  wrote:
>>> Recently a hacker proposed a patch to add pg_dynshmem to the list of
>>> directories whose contents are excluded in pg_basebackup.  I wasn't able
>>> to find the original email despite several attempts.
>>>
>>> That patch got me thinking about what else could be excluded and after
>>> some investigation I found the following: pg_notify, pg_serial,
>>> pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
>>> or rebuilt on server start.
>>
>> Eh ... I doubt very much that it's safe to blow away the entire
>> contents of an SLRU between shutdown and startup, even if the data is
>> technically transient data that won't be needed again after the system
>> is reset.
>
> I've done pretty extensive testing in pgBackRest and haven't seen issues
> in any supported version (plus I audited each init() function for every
> version back to where it was introduced).  The patch also passes all the
> pg_basebackup TAP tests in master.
>
> If you are correct it may indicate a problem anyway. Consider a standby
> backup where the files in these directories may be incredibly stale
> since they are not replicated.  Once restored to a master should we
> trust anything in these files?
>
> pg_serial, pg_notify, pg_subtrans are not even fsync'd
> (SlruCtl->do_fsync = false).  It's hard to imagine there's anything of
> value in there or that it can be trusted if there is.

It's not just a question of whether the data has value; it's a
question of whether the SLRU code will handle the situation correctly
in all cases if the directory contains no files.  I don't think you
can draw a firm conclusion on that without reading the code.

> The files in pg_snapshot and pg_dynshmem are simply deleted on startup
> so that seems safe enough.

Agreed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-08-17 Thread David Steele
Hi Robert,

On 8/17/16 11:27 AM, Robert Haas wrote:
> On Mon, Aug 15, 2016 at 3:39 PM, David Steele  wrote:
>> Recently a hacker proposed a patch to add pg_dynshmem to the list of
>> directories whose contents are excluded in pg_basebackup.  I wasn't able
>> to find the original email despite several attempts.
>>
>> That patch got me thinking about what else could be excluded and after
>> some investigation I found the following: pg_notify, pg_serial,
>> pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
>> or rebuilt on server start.
>
> Eh ... I doubt very much that it's safe to blow away the entire
> contents of an SLRU between shutdown and startup, even if the data is
> technically transient data that won't be needed again after the system
> is reset.

I've done pretty extensive testing in pgBackRest and haven't seen issues
in any supported version (plus I audited each init() function for every
version back to where it was introduced).  The patch also passes all the
pg_basebackup TAP tests in master.

If you are correct it may indicate a problem anyway. Consider a standby
backup where the files in these directories may be incredibly stale
since they are not replicated.  Once restored to a master should we
trust anything in these files?

pg_serial, pg_notify, pg_subtrans are not even fsync'd
(SlruCtl->do_fsync = false).  It's hard to imagine there's anything of
value in there or that it can be trusted if there is.

The files in pg_snapshot and pg_dynshmem are simply deleted on startup
so that seems safe enough.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-08-17 Thread Alvaro Herrera
Robert Haas wrote:

> Eh ... I doubt very much that it's safe to blow away the entire
> contents of an SLRU between shutdown and startup, even if the data is
> technically transient data that won't be needed again after the system
> is reset.

Hmm.  At least async.c (pg_notify) deletes the whole directory at
startup so it's fine, but for the others (pg_serial, pg_subtrans) I
think it'd be saner to keep any "active" files (probably just the
latest).  I don't remember how pg_snapshot works, but it's probably fine
to start with an empty subdir (is it possible to export a snapshot from
a prepared transaction?)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-08-17 Thread Robert Haas
On Mon, Aug 15, 2016 at 3:39 PM, David Steele  wrote:
> Recently a hacker proposed a patch to add pg_dynshmem to the list of
> directories whose contents are excluded in pg_basebackup.  I wasn't able
> to find the original email despite several attempts.
>
> That patch got me thinking about what else could be excluded and after
> some investigation I found the following: pg_notify, pg_serial,
> pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
> or rebuilt on server start.

Eh ... I doubt very much that it's safe to blow away the entire
contents of an SLRU between shutdown and startup, even if the data is
technically transient data that won't be needed again after the system
is reset.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-08-15 Thread David Steele

On 8/15/16 4:58 PM, Jim Nasby wrote:

On 8/15/16 2:39 PM, David Steele wrote:

That patch got me thinking about what else could be excluded and after
some investigation I found the following: pg_notify, pg_serial,
pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
or rebuilt on server start.


If someone wanted to scratch an itch, it would also be useful to put
things that are zeroed under a single dedicated directory, so that folks
who wanted to could mount that on a ram drive. It would also be useful
to do that for stuff that's only reset on crash (to put it on local
storage as opposed to a SAN).


I definitely have something like that in mind and thought this was a 
good place to start.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-08-15 Thread Jim Nasby

On 8/15/16 2:39 PM, David Steele wrote:

That patch got me thinking about what else could be excluded and after
some investigation I found the following: pg_notify, pg_serial,
pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
or rebuilt on server start.


If someone wanted to scratch an itch, it would also be useful to put 
things that are zeroed under a single dedicated directory, so that folks 
who wanted to could mount that on a ram drive. It would also be useful 
to do that for stuff that's only reset on crash (to put it on local 
storage as opposed to a SAN).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-08-15 Thread Stephen Frost
David,

* David Steele (da...@pgmasters.net) wrote:
> Recently a hacker proposed a patch to add pg_dynshmem to the list of
> directories whose contents are excluded in pg_basebackup.  I wasn't able
> to find the original email despite several attempts.

That would be here:

b4e94836-786b-6020-e1b3-3d7390f95...@aiven.io

Thanks!

Stephen


signature.asc
Description: Digital signature


[HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-08-15 Thread David Steele
Recently a hacker proposed a patch to add pg_dynshmem to the list of
directories whose contents are excluded in pg_basebackup.  I wasn't able
to find the original email despite several attempts.

That patch got me thinking about what else could be excluded and after
some investigation I found the following: pg_notify, pg_serial,
pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
or rebuilt on server start.

The attached patch adds these directories to the pg_basebackup
exclusions and takes an array-based approach to excluding directories
and files during backup.

I also incorporated Ashutosh's patch to fix corruption when
pg_stat_tmp/pg_replslot are links [1].  This logic has been extended to
all excluded directories.

Perhaps these patches should be merged in the CF but first I'd like to
see if anyone can identify problems with the additional exclusions.

Thanks,
-- 
-David
da...@pgmasters.net

[1]
http://www.postgresql.org/message-id/flat/CAE9k0Pm7=x_o0w7e2b2s2cwczdcbgczgdrxttzxozgp8beb...@mail.gmail.com/
diff --git a/src/backend/replication/basebackup.c 
b/src/backend/replication/basebackup.c
index da9b7a6..209b2cb 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -30,6 +30,7 @@
 #include "replication/basebackup.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
+#include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "utils/builtins.h"
@@ -69,9 +70,6 @@ static void throttle(size_t increment);
 /* Was the backup currently in-progress initiated in recovery mode? */
 static bool backup_started_in_recovery = false;
 
-/* Relative path of temporary statistics directory */
-static char *statrelpath = NULL;
-
 /*
  * Size of each block sent into the tar stream for larger files.
  */
@@ -95,6 +93,68 @@ static int64 elapsed_min_unit;
 static int64 throttled_last;
 
 /*
+ * The contents of these directories are removed or recreated during server
+ * start so they will not be included in the backup.  The directory entry
+ * will be included to preserve permissions.
+ */
+#define EXCLUDE_DIR_MAX8
+#define EXCLUDE_DIR_STAT_TMP   0
+
+const char *excludeDirContents[EXCLUDE_DIR_MAX] =
+{
+   /*
+* Skip temporary statistics files. The first array position will be
+* filled with the value of pgstat_stat_directory relative to PGDATA.
+* PG_STAT_TMP_DIR must be skipped even when stats_temp_directory is set
+* because PGSS_TEXT_FILE is always created there.
+*/
+   NULL,
+   PG_STAT_TMP_DIR,
+
+   /* Recreated on startup, see StartupReplicationSlots(). */
+   "pg_replslot",
+
+   /* Removed on startup, see dsm_cleanup_for_mmap(). */
+   PG_DYNSHMEM_DIR,
+
+   /* Recreated/zeroed on startup, see AsyncShmemInit(). */
+   "pg_notify",
+
+   /* Recreated on startup, see OldSerXidInit(). */
+   "pg_serial",
+
+   /* Removed on startup, see DeleteAllExportedSnapshotFiles(). */
+   "pg_snapshots",
+
+   /* Recreated/zeroed on startup, see StartupSUBTRANS(. */
+   "pg_subtrans"
+};
+
+/*
+ * Files that should not be included in the backup.
+ */
+#define EXCLUDE_FILE_MAX   5
+
+const char *excludeFile[EXCLUDE_FILE_MAX] =
+{
+   /* Skip auto conf temporary file. */
+   PG_AUTOCONF_FILENAME ".tmp",
+
+   /*
+* If there's a backup_label or tablespace_map file, it belongs to a 
backup
+* started by the user with pg_start_backup(). It is *not* correct for 
this
+* backup, our backup_label/tablespace_map is injected into the tar
+* separately.
+*/
+   BACKUP_LABEL_FILE,
+   TABLESPACE_MAP,
+
+   /* Skip postmaster.pid and postmaster.opts. */
+   "postmaster.pid",
+   "postmaster.opts"
+};
+
+/*
  * Called when ERROR or FATAL happens in perform_base_backup() after
  * we have started the backup - make sure we end it!
  */
@@ -154,11 +214,13 @@ perform_base_backup(basebackup_options *opt, DIR 
*tblspcdir)
 */
if (is_absolute_path(pgstat_stat_directory) &&
strncmp(pgstat_stat_directory, DataDir, datadirpathlen) 
== 0)
-   statrelpath = psprintf("./%s", pgstat_stat_directory + 
datadirpathlen + 1);
+   excludeDirContents[EXCLUDE_DIR_STAT_TMP] =
+   pgstat_stat_directory + datadirpathlen + 1;
else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
-   statrelpath = psprintf("./%s", pgstat_stat_directory);
+   excludeDirContents[EXCLUDE_DIR_STAT_TMP] = 
pgstat_stat_directory;
else
-   statrelpath = pgstat_stat_directory;
+   excludeDirContents[EXCLUDE_DIR_STAT_TMP] =
+   pgstat_stat_directory + 2;
 
/* Add a node for the base directory at the end */