Re: Using base backup exclusion filters to reduce data transferred with pg_rewind

2018-03-28 Thread Michael Paquier
On Thu, Mar 29, 2018 at 05:01:40AM +0900, Fujii Masao wrote:
> Committed. Thanks!

Thanks for including as well the documentation changes and committing
it.  The result looks good to me.
--
Michael


signature.asc
Description: PGP signature


Re: Using base backup exclusion filters to reduce data transferred with pg_rewind

2018-03-28 Thread Fujii Masao
On Wed, Mar 28, 2018 at 7:54 AM, Michael Paquier  wrote:
> On Wed, Mar 28, 2018 at 04:13:25AM +0900, Fujii Masao wrote:
>> This code is almost ok in practice, but using the check of
>> "strstr(path, localpath) == path" is more robust here?
>
> No problems with that either.
>
>> Using the following code instead is more robust?
>> This original code is almost ok in practice, though.
>>
>> filename = last_dir_separator(path);
>> if (filename == NULL)
>> filename = path;
>> else
>> filename++;
>> if (strcmp(filename, excludeFiles[excludeIdx]) == 0)
>
> Indeed, using last_dir_separator is a better idea for files.  I was
> looking for something like that actually.
>
>> +  (everything except the relation files). Similarly to base backups,
>> +  the contents of the directories pg_dynshmem/,
>> +  pg_notify/, pg_replslot/,
>> +  pg_serial/, pg_snapshots/,
>> +  pg_stat_tmp/, and
>> +  pg_subtrans/ are omitted from the data copied
>> +  from the source cluster. Any file or directory beginning with
>> +  pgsql_tmp is omitted, as well as are
>> +  backup_label,
>> +  tablespace_map,
>> +  pg_internal.init,
>> +  postmaster.opts and
>> +  postmaster.pid.
>>
>> I don't think this description is necessary. The doc for pg_basebackup
>> also doesn't seem to have this kind of description.
>
> Those are listed in backup.sgml.  And I really think that we should at
> least document that the same type of exclusion filters as base backups
> are used in pg_rewind.

Okay, I revived that change in the doc.

>> So attached is the patch that I updated based on your patch and
>> am thinking to apply.
>
> Thanks.  Except for the documentation part, I am OK for the changes
> proposed.

Committed. Thanks!

Regards,

-- 
Fujii Masao



Re: Using base backup exclusion filters to reduce data transferred with pg_rewind

2018-03-27 Thread Michael Paquier
On Wed, Mar 28, 2018 at 04:13:25AM +0900, Fujii Masao wrote:
> This code is almost ok in practice, but using the check of
> "strstr(path, localpath) == path" is more robust here?

No problems with that either.

> Using the following code instead is more robust?
> This original code is almost ok in practice, though.
> 
> filename = last_dir_separator(path);
> if (filename == NULL)
> filename = path;
> else
> filename++;
> if (strcmp(filename, excludeFiles[excludeIdx]) == 0)

Indeed, using last_dir_separator is a better idea for files.  I was
looking for something like that actually.

> +  (everything except the relation files). Similarly to base backups,
> +  the contents of the directories pg_dynshmem/,
> +  pg_notify/, pg_replslot/,
> +  pg_serial/, pg_snapshots/,
> +  pg_stat_tmp/, and
> +  pg_subtrans/ are omitted from the data copied
> +  from the source cluster. Any file or directory beginning with
> +  pgsql_tmp is omitted, as well as are
> +  backup_label,
> +  tablespace_map,
> +  pg_internal.init,
> +  postmaster.opts and
> +  postmaster.pid.
> 
> I don't think this description is necessary. The doc for pg_basebackup
> also doesn't seem to have this kind of description.

Those are listed in backup.sgml.  And I really think that we should at
least document that the same type of exclusion filters as base backups
are used in pg_rewind.  If you don't want to include the whole list,
then let's use a reference to backup-lowlevel-base-backup-data.

> So attached is the patch that I updated based on your patch and
> am thinking to apply.

Thanks.  Except for the documentation part, I am OK for the changes
proposed.
--
Michael


signature.asc
Description: PGP signature


Re: Using base backup exclusion filters to reduce data transferred with pg_rewind

2018-03-27 Thread Fujii Masao
On Tue, Mar 27, 2018 at 10:55 AM, Michael Paquier  wrote:
> On Tue, Mar 27, 2018 at 01:32:41AM +0900, Fujii Masao wrote:
>> +1. It's better for us to focus on the code change of the fillter on 
>> pg_rewind
>> rather than such "refactoring".
>
> (filter takes one 'l', not two)
>
> Okay.  I had my mind mostly focused on how to shape the exclusion list
> and get it shared between the base backup and pg_rewind, so let's move
> on with the duplicated list for now.  I did not put much efforts into
> the pg_rewind portion to be honest.
>
>> As I told upthread, the previous patch has the
>> problem where the files which should be skipped are not skipped. ISTM that,
>> in pg_rewind, the filter should be triggered in recurse_dir() not
>> process_source_file().
>
> If you put that into recurse_dir you completely ignore the case where
> changes are fetched by libpq.  Doing the filtering when processing the
> file map has the advantage to take care of both the local and remote
> cases, which is why I am doing it there.

OK.

>> BTW what should pg_rewind do when it finds the directory which should be
>> skipped, in the source directory? In your patch, pg_rewind just tries to skip
>> that directory at all. But isn't this right behavior? If that directory 
>> doesn't
>> exist in the target directory (though I'm not sure if this situation is 
>> really
>> possible), I'm thinking that pg_rewind should create that "empty" directory
>> in the target. No?
>
> I am not exactly sure what you are coming up with here.  The target
> server should have the same basic directory mapping as the source as the
> target has been initialized normally with initdb or a base backup from
> another node, so checking for the *contents* of directories is enough
> and keeps the code more simple, as the exclude filter entries are based
> on elements inherent to PostgreSQL internals.  Please note as well that
> if a non-system directory is present on the source but not the target
> then it would get created on the target.
>
> At the end I have finished with the attached.

Thanks for the patch!

+ snprintf(localpath, sizeof(localpath), "%s/",
+ excludeDirContents[excludeIdx]);
+ if (strstr(path, localpath) != NULL)

This code is almost ok in practice, but using the check of
"strstr(path, localpath) == path" is more robust here?

+ for (excludeIdx = 0; excludeFiles[excludeIdx] != NULL; excludeIdx++)
+ {
+ if (strstr(path, excludeFiles[excludeIdx]) != NULL)

Using the following code instead is more robust?
This original code is almost ok in practice, though.

filename = last_dir_separator(path);
if (filename == NULL)
filename = path;
else
filename++;
if (strcmp(filename, excludeFiles[excludeIdx]) == 0)

+  (everything except the relation files). Similarly to base backups,
+  the contents of the directories pg_dynshmem/,
+  pg_notify/, pg_replslot/,
+  pg_serial/, pg_snapshots/,
+  pg_stat_tmp/, and
+  pg_subtrans/ are omitted from the data copied
+  from the source cluster. Any file or directory beginning with
+  pgsql_tmp is omitted, as well as are
+  backup_label,
+  tablespace_map,
+  pg_internal.init,
+  postmaster.opts and
+  postmaster.pid.

I don't think this description is necessary. The doc for pg_basebackup
also doesn't seem to have this kind of description.

So attached is the patch that I updated based on your patch and
am thinking to apply.

Regards,

-- 
Fujii Masao


add_exclude_list_similar_to_base_backup_v2_fujii.patch
Description: Binary data


Re: Using base backup exclusion filters to reduce data transferred with pg_rewind

2018-03-26 Thread Michael Paquier
On Sat, Mar 24, 2018 at 09:12:09PM -0400, Stephen Frost wrote:
> Yeah, neither 2 or 3 really appeals to me.  Option 1 does touch a number
> of places but in a pretty straight-forward way- and if there's a typo
> there, the compiler is likely to complain, so it seems like the risk is
> relatively low.

One example of place which can be easily consolidated is pg_wal whose
definition is in xlog_internal.h.  And there are a couple of other
places which could be consolidated without major refactoring like what I
proposed initially on this thread.  I would suggest to focus on this
effort on a separate thread later on.
--
Michael


signature.asc
Description: PGP signature


Re: Using base backup exclusion filters to reduce data transferred with pg_rewind

2018-03-26 Thread Michael Paquier
On Tue, Mar 27, 2018 at 01:32:41AM +0900, Fujii Masao wrote:
> +1. It's better for us to focus on the code change of the fillter on pg_rewind
> rather than such "refactoring".

(filter takes one 'l', not two)

Okay.  I had my mind mostly focused on how to shape the exclusion list
and get it shared between the base backup and pg_rewind, so let's move
on with the duplicated list for now.  I did not put much efforts into
the pg_rewind portion to be honest.

> As I told upthread, the previous patch has the
> problem where the files which should be skipped are not skipped. ISTM that,
> in pg_rewind, the filter should be triggered in recurse_dir() not
> process_source_file().

If you put that into recurse_dir you completely ignore the case where
changes are fetched by libpq.  Doing the filtering when processing the
file map has the advantage to take care of both the local and remote
cases, which is why I am doing it there.  So you would just get half of
the cake and not the whole of it.

> BTW what should pg_rewind do when it finds the directory which should be
> skipped, in the source directory? In your patch, pg_rewind just tries to skip
> that directory at all. But isn't this right behavior? If that directory 
> doesn't
> exist in the target directory (though I'm not sure if this situation is 
> really 
> possible), I'm thinking that pg_rewind should create that "empty" directory
> in the target. No?

I am not exactly sure what you are coming up with here.  The target
server should have the same basic directory mapping as the source as the
target has been initialized normally with initdb or a base backup from
another node, so checking for the *contents* of directories is enough
and keeps the code more simple, as the exclude filter entries are based
on elements inherent to PostgreSQL internals.  Please note as well that
if a non-system directory is present on the source but not the target
then it would get created on the target.

At the end I have finished with the attached.  I have taken the decision
to not include as well xlog.h in pg_rewind to avoid having to drag a lot
of backend-only headers like pg_resetwal does, which I prefer avoid as
that's only hardcoding values for "backup_label" and "tablespace_map".
This applies filters based on directory contents, so by running the
regression tests you can see entries like the following ones:
entry "postmaster.opts" excluded from source file list
entry "pg_subtrans/" excluded from source file list
entry "pg_notify/" excluded from source file list
entry "base/12360/pg_internal.init" excluded from source file list
entry "backup_label.old" excluded from source file list
entry "global/pg_internal.init" excluded from source file list
entry "postmaster.opts" excluded from target file list
entry "pg_subtrans/" excluded from target file list
entry "pg_notify/" excluded from target file list
entry "base/12360/pg_internal.init" excluded from target file list
entry "global/pg_internal.init" excluded from target file list

Processing the filemap list on the target also matters in my opinion.
When at recovery, all the previous files will be wiped out, and we
should not remove either things like postmaster.pid as those are around
to prevent corruption problems.

Thanks,
--
Michael
From ba4502c9711c7b7933a5ef2186e895c0ae6b4616 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 5 Feb 2018 15:48:35 +0900
Subject: [PATCH] Add exclude list similar to base backups in pg_rewind

After being rewound, a standby to-be-recycled needs to perform recovery
from the last checkpoint where WAL forked after a promotion, which leads
it to automatically remove some files which may have been copied from
the source cluster. This makes use of the same filtering list as base
backups to find out what is this data and then remove it. This reduces
the amount of data transferred during a rewind without changing the
usefulness of the operation.

Documentation is updated to take into account what is filtered out.
---
 doc/src/sgml/ref/pg_rewind.sgml  |  14 +++-
 src/backend/replication/basebackup.c |   3 +
 src/bin/pg_rewind/filemap.c  | 144 ---
 3 files changed, 148 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 8e49249826..520d843f0e 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -231,7 +231,19 @@ PostgreSQL documentation
  
   Copy all other files such as pg_xact and
   configuration files from the source cluster to the target cluster
-  (everything except the relation files).
+  (everything except the relation files). Similarly to base backups,
+  the contents of the directories pg_dynshmem/,
+  pg_notify/, pg_replslot/,
+  pg_serial/, pg_snapshots/,
+  pg_stat_tmp/, and
+  pg_subtrans/ are omitted from the data copied
+  from the source cluster. Any file or 

Re: Using base backup exclusion filters to reduce data transferred with pg_rewind

2018-03-26 Thread Fujii Masao
On Sun, Mar 25, 2018 at 5:06 PM, Michael Paquier  wrote:
> On Sat, Mar 24, 2018 at 11:14:34PM -0400, Tom Lane wrote:
>> Stephen Frost  writes:
>>> I don't completely buy off on the argument that having these #define's
>>> would make it easier for forks (we've had quite a few folks fork PG, but
>>> how many of them have actually changed "base"?) and I'm on the fence
>>> about if these will make our lives simpler down the road when it comes
>>> to changing the directory names
>>
>> I am distressed that nobody, apparently, is putting any weight on the
>> back-patching pain that will result from widespread replacement of path
>> names with macros.  I don't buy that either we or anyone else will need
>> to change these names in future, so I see pain and effectively no
>> gain.
>
> That's actually something I worry about as well (as the author!), which
> is why I qualify the changes as intrusive.  At the end, I think that I
> would be tempted to just do #3, aka to keep a copy of the filter list in
> pg_rewind code while hardcoding a minimum of names and mention in both
> basebackup.c and pg_rewind code to not forget to update the filter list
> if necessary.

+1. It's better for us to focus on the code change of the fillter on pg_rewind
rather than such "refactoring". As I told upthread, the previous patch has the
problem where the files which should be skipped are not skipped. ISTM that,
in pg_rewind, the fillter should be triggered in recurse_dir() not
process_source_file().

BTW what should pg_rewind do when it finds the directory which should be
skipped, in the source directory? In your patch, pg_rewind just tries to skip
that directory at all. But isn't this right behavior? If that directory doesn't
exist in the target directory(though I'm not sure if this situation is really
possible), I'm thinking that pg_rewind should create that "empty" directory
 in the target. No?

Regards,

-- 
Fujii Masao



Re: Using base backup exclusion filters to reduce data transferred with pg_rewind

2018-03-25 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Sat, Mar 24, 2018 at 11:14:34PM -0400, Tom Lane wrote:
> > Stephen Frost  writes:
> >> I don't completely buy off on the argument that having these #define's
> >> would make it easier for forks (we've had quite a few folks fork PG, but
> >> how many of them have actually changed "base"?) and I'm on the fence
> >> about if these will make our lives simpler down the road when it comes
> >> to changing the directory names
> > 
> > I am distressed that nobody, apparently, is putting any weight on the
> > back-patching pain that will result from widespread replacement of path
> > names with macros.  I don't buy that either we or anyone else will need
> > to change these names in future, so I see pain and effectively no
> > gain.

While I agree that some consideration should be given to the impact a
change has on back-patching, I continue to be of the opinion that this
would be a good change, for the reasons which I outlined up-thread.
That said, I don't hold that position very strongly.

> That's actually something I worry about as well (as the author!), which
> is why I qualify the changes as intrusive.  At the end, I think that I
> would be tempted to just do #3, aka to keep a copy of the filter list in
> pg_rewind code while hardcoding a minimum of names and mention in both
> basebackup.c and pg_rewind code to not forget to update the filter list
> if necessary.  New paths in the data folder are not added on a monthly
> basis either, and not all of them can be filtered out so that's easy to
> maintain.

Intrusive, at least from my viewpoint, is more about how the code is
changed around and/or refactored than about the impact it has on
back-patching.  These are pretty mechanical changes, after all.

I'm not particularly thrilled with the idea of having two independent
lists of hard-coded paths to maintain, even with comments in both places
saying to update the other list.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Using base backup exclusion filters to reduce data transferred with pg_rewind

2018-03-25 Thread Michael Paquier
On Sat, Mar 24, 2018 at 11:14:34PM -0400, Tom Lane wrote:
> Stephen Frost  writes:
>> I don't completely buy off on the argument that having these #define's
>> would make it easier for forks (we've had quite a few folks fork PG, but
>> how many of them have actually changed "base"?) and I'm on the fence
>> about if these will make our lives simpler down the road when it comes
>> to changing the directory names
> 
> I am distressed that nobody, apparently, is putting any weight on the
> back-patching pain that will result from widespread replacement of path
> names with macros.  I don't buy that either we or anyone else will need
> to change these names in future, so I see pain and effectively no
> gain.

That's actually something I worry about as well (as the author!), which
is why I qualify the changes as intrusive.  At the end, I think that I
would be tempted to just do #3, aka to keep a copy of the filter list in
pg_rewind code while hardcoding a minimum of names and mention in both
basebackup.c and pg_rewind code to not forget to update the filter list
if necessary.  New paths in the data folder are not added on a monthly
basis either, and not all of them can be filtered out so that's easy to
maintain.
--
Michael


signature.asc
Description: PGP signature


Re: Using base backup exclusion filters to reduce data transferred with pg_rewind

2018-03-24 Thread Tom Lane
Stephen Frost  writes:
> I don't completely buy off on the argument that having these #define's
> would make it easier for forks (we've had quite a few folks fork PG, but
> how many of them have actually changed "base"?) and I'm on the fence
> about if these will make our lives simpler down the road when it comes
> to changing the directory names

I am distressed that nobody, apparently, is putting any weight on the
back-patching pain that will result from widespread replacement of path
names with macros.  I don't buy that either we or anyone else will need
to change these names in future, so I see pain and effectively no gain.

Furthermore, I think it's completely silly to claim that this sort of
thing is a gain in readability or understandability:

-path = psprintf("base/%u/t%d_%u",
-dbNode, backendId, relNode);
+path = psprintf("%s/%u/t%d_%u",
+PG_BASE_DIR, dbNode, backendId, relNode);

For my money it's a loss on both points.  The extra level of indirection
is just obscuring what's actually happening and putting extra cognitive
load on the reader.

We have better things to spend our time on.

regards, tom lane



Re: Using base backup exclusion filters to reduce data transferred with pg_rewind

2018-03-24 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Fri, Mar 23, 2018 at 01:38:38AM +0900, Fujii Masao wrote:
> > Personally it looks very intrusive, so I'm feeling inclined to push
> > the changes without that refactoring.

I've been reading over the first couple of posted patches and mulling
over the changes proposed.  They certainly touch a lot of places but
they're pretty straight-forward changes and so I'm not really sure I'd
call them all that intrusive.

I don't completely buy off on the argument that having these #define's
would make it easier for forks (we've had quite a few folks fork PG, but
how many of them have actually changed "base"?) and I'm on the fence
about if these will make our lives simpler down the road when it comes
to changing the directory names (if we changed "pg_multixact/members" to
be "pg_multixact/processes" or some such, I imagine we'd also go through
and change PG_MULTIXACT_MEMBERS_DIR to be PG_MULTIXACT_PROCESSES_DIR
anyway, so this doesn't result in much improvement there), but as it
relates to new tool development and such, there's some value here
because the compiler is going to complain if you say
PG_COMMIT_TX_DIR and there is no such #define, whereas a similar mistake
with a string literal of "pg_commit_tx" might end up getting missed and
that would be unfortunate.

Therefore, on the whole, I'm +1 on these changes, but I'd argue a bit
about some of the choices made:

- Let's have them all be PG_WHATEVER_DIR/FILE
- Use WAL instead of XLOG (I thought we agreed new code would..?)
- Remove extraneous #define's (most of these were, but
  DIRECTORY_LOCK_FILE was kept..?  Let's use PG_POSTMASTER_PID_FILE
  throughout instead, with a comment that it's also used as a lock
  file).

Might be nice to go back and modify other pre-existing #define's to use
that form also, but perhaps not that big of a deal.  I would have that
be independent from this, in any case.

> Okay.  Just moving the list of items from basebackup.c to a dedicated
> header is not sufficient though as things like RELCACHE_INIT_FILENAME as
> declared in headers which are backend-only.  The same applies to
> LOG_METAINFO_DATAFILE_TMP, PG_AUTOCONF_FILENAME, PG_STAT_TMP_DIR and
> PG_DYNSHMEM_DIR.
> 
> BACKUP_LABEL_FILE and TABLESPACE_MAP can be included though via xlog.h.
> So what are you looking for?  I see a couple of options:
> 1) The inclusive refactoring, which you are discarding.
> 2) A dedicated header, but some of the now-not-hardcoded values will
> need to be so.  That's the list I am giving above.
> 3) A copy of the list from basebackup.c to src/bin/pg_rewind/.
> 
> I would guess that you are looking for 2), but I am not sure if you
> imply that 3) would be acceptable or not.

Yeah, neither 2 or 3 really appeals to me.  Option 1 does touch a number
of places but in a pretty straight-forward way- and if there's a typo
there, the compiler is likely to complain, so it seems like the risk is
relatively low.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Using base backup exclusion filters to reduce data transferred with pg_rewind

2018-03-22 Thread Michael Paquier
On Fri, Mar 23, 2018 at 01:38:38AM +0900, Fujii Masao wrote:
> Personally it looks very intrusive, so I'm feeling inclined to push
> the changes without that refactoring.

Okay.  Just moving the list of items from basebackup.c to a dedicated
header is not sufficient though as things like RELCACHE_INIT_FILENAME as
declared in headers which are backend-only.  The same applies to
LOG_METAINFO_DATAFILE_TMP, PG_AUTOCONF_FILENAME, PG_STAT_TMP_DIR and
PG_DYNSHMEM_DIR.

BACKUP_LABEL_FILE and TABLESPACE_MAP can be included though via xlog.h.
So what are you looking for?  I see a couple of options:
1) The inclusive refactoring, which you are discarding.
2) A dedicated header, but some of the now-not-hardcoded values will
need to be so.  That's the list I am giving above.
3) A copy of the list from basebackup.c to src/bin/pg_rewind/.

I would guess that you are looking for 2), but I am not sure if you
imply that 3) would be acceptable or not.
--
Michael


signature.asc
Description: PGP signature


Re: Using base backup exclusion filters to reduce data transferred with pg_rewind

2018-03-22 Thread Fujii Masao
On Tue, Mar 13, 2018 at 9:48 PM, Michael Paquier  wrote:
> On Tue, Mar 13, 2018 at 01:16:27PM +0300, Anastasia Lubennikova wrote:
>> Since these patches contain mostly cosmetic changes and do not break
>> anything, I think it's fine to mark this thread as Ready For Committer
>> without long discussion.
>
> Thanks Anastasia for the review.  The refactoring is quite intuitive I
> think, still that's perhaps a bit too much intrusive.  Extra opinions
> about that are welcome.

Personally it looks very intrusive, so I'm feeling inclined to push
the changes without that refactoring.

The 0005 patch doesn't seem to be right. The patch changes process_source_file()
so that it excludes some directories like pg_notify from the filemap. However,
after that, process_source_file() is called for the files under those "excluded"
directories and then they are not excluded. For example, pg_notify/ file is
unexpectedly included in the filemap and copied by pg_rewind.

This problem happens because recurse_dir() has the following code and
ISTM you forgot to take into account it.

else if (S_ISDIR(fst.st_mode))
{
callback(path, FILE_TYPE_DIRECTORY, 0, NULL);
/* recurse to handle subdirectories */
recurse_dir(datadir, path, callback);
}

Regards,

-- 
Fujii Masao



Re: Using base backup exclusion filters to reduce data transferred with pg_rewind

2018-03-13 Thread Michael Paquier
On Tue, Mar 13, 2018 at 01:16:27PM +0300, Anastasia Lubennikova wrote:
> Since these patches contain mostly cosmetic changes and do not break
> anything, I think it's fine to mark this thread as Ready For Committer
> without long discussion.

Thanks Anastasia for the review.  The refactoring is quite intuitive I
think, still that's perhaps a bit too much intrusive.  Extra opinions
about that are welcome.

As I read again the patch set, please note that I am not much happy yet
about the part for the handling of temporary files when applying the
filters in pg_rewind and the lack inconsistency for file filters and
directory filters...
--
Michael


signature.asc
Description: PGP signature


Re: Using base backup exclusion filters to reduce data transferred with pg_rewind

2018-03-13 Thread Anastasia Lubennikova

05.02.2018 10:10, Michael Paquier:

So the patch set attached is made of the following:
- 0001, which refactors all hardcoded system paths into pg_paths.h.
This modifies only initdb.c and basebackup.c to ease reviews.
- 0002 spreads the path changes and the use of pg_paths.h across the
core code.
- 0003 moves the last set of definitions with backup_label,
tablespace_map and pg_internal.init.
- 0004 creates basebackup_paths.h, this can be consumed by pg_rewind.
- 0005 makes the changes for pg_rewind.

Thank you for this set of patches.
This refactoring makes code way more convenient to read and change.

Due to some merge conflicts, patch 0002 was not applying clearly.
So I attach the updated version.
I also noticed a couple of rows that were not updated, and wrote a patch 
0006,
which contains just minor changes and can be applied on top of any patch 
after 0003.


Since these patches contain mostly cosmetic changes and do not break 
anything,
I think it's fine to mark this thread as Ready For Committer without 
long discussion.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From 6698d430ffb6bd2e33aba0b21dc2a9358cf6328c Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 5 Feb 2018 13:03:38 +0900
Subject: [PATCH 1/5] Refactor path definitions into a single header file,
 pg_paths.h

The definition of all internal paths of PostgreSQL are spread across the
code base, making tracing of those definitions difficult to begin with,
and resulting in a lot of code paths to be hardcoded.  While this has no
real practical consequences in practice, this makes the creation of
lists manipulating those paths less maintainable as as things stand the
already-hardcoded paths need to be copied around more.  In order to
improve things for the long-term, move all those definitions into a
single header file.

This commit does a first step by switching basebackup.c and initdb.c to
use them.  An upcoming commit will make all the definitions across the
code base use this new header more consistently.
---
 src/backend/postmaster/postmaster.c  |   1 +
 src/backend/postmaster/syslogger.c   |   1 +
 src/backend/replication/basebackup.c |  16 ++--
 src/backend/storage/ipc/dsm.c|   1 +
 src/backend/storage/ipc/dsm_impl.c   |   1 +
 src/backend/storage/lmgr/predicate.c |   3 +-
 src/backend/utils/adt/misc.c |   1 +
 src/bin/initdb/initdb.c  |  45 ++--
 src/include/access/xlog_internal.h   |   7 +-
 src/include/pg_paths.h   | 137 +++
 src/include/pgstat.h |   3 -
 src/include/postmaster/syslogger.h   |   7 --
 src/include/storage/dsm_impl.h   |   9 ---
 src/include/utils/guc.h  |   7 --
 14 files changed, 176 insertions(+), 63 deletions(-)
 create mode 100644 src/include/pg_paths.h

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index f3ddf828bb..66d80914a0 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -105,6 +105,7 @@
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pg_getopt.h"
+#include "pg_paths.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
 #include "postmaster/autovacuum.h"
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index f70eea37df..c8770f6c61 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -35,6 +35,7 @@
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "nodes/pg_list.h"
+#include "pg_paths.h"
 #include "pgstat.h"
 #include "pgtime.h"
 #include "postmaster/fork_process.h"
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index dd7ad64862..63ab81f837 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -117,25 +117,25 @@ static const char *excludeDirContents[] =
 	 * even if the intention is to restore to another master. See backup.sgml
 	 * for a more detailed description.
 	 */
-	"pg_replslot",
+	PG_REPLSLOT_DIR,
 
 	/* Contents removed on startup, see dsm_cleanup_for_mmap(). */
 	PG_DYNSHMEM_DIR,
 
 	/* Contents removed on startup, see AsyncShmemInit(). */
-	"pg_notify",
+	PG_NOTIFY_DIR,
 
 	/*
 	 * Old contents are loaded for possible debugging but are not required for
 	 * normal operation, see OldSerXidInit().
 	 */
-	"pg_serial",
+	PG_SERIAL_DIR,
 
 	/* Contents removed on startup, see DeleteAllExportedSnapshotFiles(). */
-	"pg_snapshots",
+	PG_SNAPSHOTS_DIR,
 
 	/* Contents zeroed on startup, see StartupSUBTRANS(). */
-	"pg_subtrans",
+	PG_SUBTRANS_DIR,
 
 	/* end of list */
 	NULL
@@ -147,7 +147,7 @@ static const char *excludeDirContents[] =
 static const char *excludeFiles[] =
 {
 	/* Skip auto conf temporary file. */
-	PG_AUTOCONF_FILENAME ".tmp",
+	PG_AUTOCONF_FILENAME_TMP,
 
 	/* Skip current log file temporary file */