subject:"\"pg_rewind docs correction\""

Re: pg_rewind docs correction

2020-05-01 Thread James Coleman

On Fri, May 1, 2020 at 4:46 AM Michael Paquier  wrote:
>
> On Wed, Apr 29, 2020 at 09:15:06AM +0900, Michael Paquier wrote:
> > I am letting that aside for a couple of days to see if others have
> > more comments, and will likely commit it after an extra lookup.
>
> And applied after an extra lookup.  Thanks for the discussion, James.

Yep. Thanks for pushing to make sure it was as correct as possible
while improving it.

James

Re: pg_rewind docs correction

2020-05-01 Thread Michael Paquier

On Wed, Apr 29, 2020 at 09:15:06AM +0900, Michael Paquier wrote:
> I am letting that aside for a couple of days to see if others have
> more comments, and will likely commit it after an extra lookup.

And applied after an extra lookup.  Thanks for the discussion, James.
--
Michael


signature.asc
Description: PGP signature

Re: pg_rewind docs correction

2020-04-28 Thread Michael Paquier

On Tue, Apr 28, 2020 at 12:13:38PM -0400, James Coleman wrote:
> I think is missing a word. Instead of "especially the the target"
> should be "especially if the target".

Thanks, fixed.

> In this block:
> 
> +  Create a backup_label file to begin WAL replay at
> +  the checkpoint created at failover and configure the
> +  pg_control file with a minimum consistency LSN
> +  defined as the result of pg_current_wal_insert_lsn()
> +  when rewinding from a live source and using the last checkpoint LSN
> +  when rewinding from a stopped source.
> + 
> 
> Perhaps change "and using the last checkpoint LSN" to "or the last
> checkpoint LSN". Alternatively you could make the grammar parallel by
> changing to "and defined as the last checkpoint LSN", but that seems
> wordy, and the "defined as [item or item]" is already a good grammar
> construction.

Using your first suggestion of "or the last checkpoint LSN" sounds
more natural as of this morning, so updated the patch with that.

I am letting that aside for a couple of days to see if others have
more comments, and will likely commit it after an extra lookup.
--
Michael
From 48b9e85fe9ae414a0f42881558ab2fd920ba0c60 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 29 Apr 2020 09:11:28 +0900
Subject: [PATCH v6] Improve pg_rewind explanation and warnings

The pg_rewind docs currently assert that the state of the target's
data directory after rewind is equivalent to the source's data
directory. But that isn't quite true both because the base state is
further back in time and because the target's data directory will
include the current state on the source of any copied blocks.
Instead using the analogy to back backups helps explain the state,
as well as the pros and cons of using the utility.

The How It Works section now:
- Includes details about how the backup_label file is created.
- Includes details about how the pg_control file is updated.
- Is updated to include WAL segments and new relation files in the
  list of files copied wholesale from the source.

Finally, document clearly the state of the cluster after the operation
and also the operation sequencing dangers caused by copying
configuration files from the source.
---
 doc/src/sgml/ref/pg_rewind.sgml | 90 +
 1 file changed, 57 insertions(+), 33 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 07c49e4719..391e5db2e2 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -48,14 +48,16 @@ PostgreSQL documentation
   
 
   
-   The result is equivalent to replacing the target data directory with the
-   source one. Only changed blocks from relation files are copied;
-   all other files are copied in full, including configuration files. The
-   advantage of pg_rewind over taking a new base backup, or
-   tools like rsync, is that pg_rewind does
-   not require reading through unchanged blocks in the cluster. This makes
-   it a lot faster when the database is large and only a small
-   fraction of blocks differ between the clusters.
+   After a successful rewind, the state of the target data directory is
+   analogous to a base backup of the source data directory. Unlike taking
+   a new base backup or using a tool like rsync,
+   pg_rewind does not require comparing or copying
+   unchanged relation blocks in the cluster. Only changed blocks from existing
+   relation files are copied; all other files, including new relation files,
+   configuration files, and WAL segments, are copied in full. As such the
+   rewind operation is significantly faster than other approaches when the
+   database is large and only a small fraction of blocks differ between the
+   clusters.
   
 
   
@@ -77,16 +79,18 @@ PostgreSQL documentation
   
 
   
-   When the target server is started for the first time after running
-   pg_rewind, it will go into recovery mode and replay all
-   WAL generated in the source server after the point of divergence.
-   If some of the WAL was no longer available in the source server when
-   pg_rewind was run, and therefore could not be copied by the
-   pg_rewind session, it must be made available when the
-   target server is started. This can be done by creating a
-   recovery.signal file in the target data directory
-   and configuring suitable 
-   in postgresql.conf.
+   After running pg_rewind, WAL replay needs to
+   complete for the data directory to be in a consistent state. When the
+   target server is started again it will enter archive recovery and replay
+   all WAL generated in the source server from the last checkpoint before
+   the point of divergence. If some of the WAL was no longer available in the
+   source server when pg_rewind was run, and
+   therefore could not be copied by the pg_rewind
+   session, it must be made available when the target server is started.
+   This can be done by creating a recovery.signal file
+   in

Re: pg_rewind docs correction

2020-04-28 Thread James Coleman

On Tue, Apr 28, 2020 at 12:31 AM Michael Paquier  wrote:
>
> On Mon, Mar 09, 2020 at 09:26:17AM -0400, James Coleman wrote:
> >> -  pg_stat_tmp/, and
> >> -  pg_subtrans/ are omitted from the data copied
> >> -  from the source cluster. Any file or directory beginning with
> >> -  pgsql_tmp is omitted, as well as are
> >> +  pg_stat_tmp/, and
> >> pg_subtrans/
> >> +  are omitted from the data copied from the source cluster. The files
> >>
> >> This is just reorganizing an existing list, why?
> >>
> >
> > The grammar seemed a bit awkward to me, so while I was already reworking
> > this paragraph I tried to clean that up a bit.
>
> Thanks for the new patch, and sorry for the delay.
>
> Okay, I saw what you were coming at here, with one sentence for
> directories, and one for files.
>
> > Still ongoing, correct? I guess I mentally think of them as being only one
> > month, but I guess that's not actually true. Regardless I'm not sure what
> > policy is for patches that have been in flight in hackers for a while but
> > just missed being added to the CF app.
>
> This is a documentation patch, so improving this part of the docs now
> is fine by me, particularly as this is an improvement.  Here are more
> notes from me:
> - I have removed the "As with a base backup" at the beginning of the
> second paragraph you modified.  The first paragraph modified already
> references a base backup, so one reference is enough IMO.
> - WAL replay does not happen from the WAL position where WAL diverged,
> but from the last checkpoint before WAL diverged.
> - Did some tweaks about the new part for configuration files, as it
> may actually not be necessary to update the configuration for recovery
> to complete (depending on the settings of the source, the target may
> just require the creation of a standby.signal file in its data
> directory particularly with a common archive location for multiple
> clusters).
> - Some word-smithing in the step-by-step description.
>
> Is the updated version fine for you?

In your revised patched the follow paragraph:

+   
+As pg_rewind copies configuration files
+entirely from the source, it may be required to correct the configuration
+used for recovery before restarting the target server, especially the
+the target is reintroduced as a standby of the source. If you restart
+the server after the rewind operation has finished but without configuring
+recovery, the target may again diverge from the primary.
+   

I think is missing a word. Instead of "especially the the target"
should be "especially if the target".

In this block:

+  Create a backup_label file to begin WAL replay at
+  the checkpoint created at failover and configure the
+  pg_control file with a minimum consistency LSN
+  defined as the result of pg_current_wal_insert_lsn()
+  when rewinding from a live source and using the last checkpoint LSN
+  when rewinding from a stopped source.
+ 

Perhaps change "and using the last checkpoint LSN" to "or the last
checkpoint LSN". Alternatively you could make the grammar parallel by
changing to "and defined as the last checkpoint LSN", but that seems
wordy, and the "defined as [item or item]" is already a good grammar
construction.

Other than those two small things, your proposed revision looks good to me.

Thanks,
James

Re: pg_rewind docs correction

2020-04-27 Thread Michael Paquier

On Mon, Mar 09, 2020 at 09:26:17AM -0400, James Coleman wrote:
>> -  pg_stat_tmp/, and
>> -  pg_subtrans/ are omitted from the data copied
>> -  from the source cluster. Any file or directory beginning with
>> -  pgsql_tmp is omitted, as well as are
>> +  pg_stat_tmp/, and
>> pg_subtrans/
>> +  are omitted from the data copied from the source cluster. The files
>>
>> This is just reorganizing an existing list, why?
>>
> 
> The grammar seemed a bit awkward to me, so while I was already reworking
> this paragraph I tried to clean that up a bit.

Thanks for the new patch, and sorry for the delay.

Okay, I saw what you were coming at here, with one sentence for
directories, and one for files.

> Still ongoing, correct? I guess I mentally think of them as being only one
> month, but I guess that's not actually true. Regardless I'm not sure what
> policy is for patches that have been in flight in hackers for a while but
> just missed being added to the CF app.

This is a documentation patch, so improving this part of the docs now
is fine by me, particularly as this is an improvement.  Here are more
notes from me:
- I have removed the "As with a base backup" at the beginning of the
second paragraph you modified.  The first paragraph modified already
references a base backup, so one reference is enough IMO.
- WAL replay does not happen from the WAL position where WAL diverged,
but from the last checkpoint before WAL diverged.
- Did some tweaks about the new part for configuration files, as it
may actually not be necessary to update the configuration for recovery
to complete (depending on the settings of the source, the target may
just require the creation of a standby.signal file in its data
directory particularly with a common archive location for multiple
clusters).
- Some word-smithing in the step-by-step description.

Is the updated version fine for you?
--
Michael
From 30d0e80e8e777c9b1c3f34aa281f9623e61ea17c Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 28 Apr 2020 13:29:26 +0900
Subject: [PATCH v5] Improve pg_rewind explanation and warnings

The pg_rewind docs currently assert that the state of the target's
data directory after rewind is equivalent to the source's data
directory. But that isn't quite true both because the base state is
further back in time and because the target's data directory will
include the current state on the source of any copied blocks.
Instead using the analogy to back backups helps explain the state,
as well as the pros and cons of using the utility.

The How It Works section now:
- Includes details about how the backup_label file is created.
- Includes details about how the pg_control file is updated.
- Is updated to include WAL segments and new relation files in the
  list of files copied wholesale from the source.

Finally, document clearly the state of the cluster after the operation
and also the operation sequencing dangers caused by copying
configuration files from the source.
---
 doc/src/sgml/ref/pg_rewind.sgml | 90 +
 1 file changed, 57 insertions(+), 33 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 07c49e4719..9525e09c98 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -48,14 +48,16 @@ PostgreSQL documentation
   
 
   
-   The result is equivalent to replacing the target data directory with the
-   source one. Only changed blocks from relation files are copied;
-   all other files are copied in full, including configuration files. The
-   advantage of pg_rewind over taking a new base backup, or
-   tools like rsync, is that pg_rewind does
-   not require reading through unchanged blocks in the cluster. This makes
-   it a lot faster when the database is large and only a small
-   fraction of blocks differ between the clusters.
+   After a successful rewind, the state of the target data directory is
+   analogous to a base backup of the source data directory. Unlike taking
+   a new base backup or using a tool like rsync,
+   pg_rewind does not require comparing or copying
+   unchanged relation blocks in the cluster. Only changed blocks from existing
+   relation files are copied; all other files, including new relation files,
+   configuration files, and WAL segments, are copied in full. As such the
+   rewind operation is significantly faster than other approaches when the
+   database is large and only a small fraction of blocks differ between the
+   clusters.
   
 
   
@@ -77,16 +79,18 @@ PostgreSQL documentation
   
 
   
-   When the target server is started for the first time after running
-   pg_rewind, it will go into recovery mode and replay all
-   WAL generated in the source server after the point of divergence.
-   If some of the WAL was no longer available in the source server when
-   pg_rewind was run, and therefore could not be copied by the
-   pg_rewind session, it must be made available when t

Re: pg_rewind docs correction

2020-03-09 Thread James Coleman

On Mon, Mar 9, 2020 at 2:59 AM Michael Paquier  wrote:

> On Sun, Mar 08, 2020 at 05:13:21PM -0400, James Coleman wrote:
> > I've added the information about how the backup label control file is
> > written, and updated the How It Works steps to refer to that separately
> > from restart.
> >
> > Additionally the How It Works is updated to include WAL segments and new
> > relation files in the list of files copied wholesale, since that was
> > previously stated but somewhat contradicted there.
>
> -   The result is equivalent to replacing the target data directory with
> the
> -   source one. Only changed blocks from relation files are copied;
> -   all other files are copied in full, including configuration files. The
> -   advantage of pg_rewind over taking a new
> base backup, or
> -   tools like rsync, is that
> pg_rewind does
> -   not require reading through unchanged blocks in the cluster. This makes
> -   it a lot faster when the database is large and only a small
> -   fraction of blocks differ between the clusters.
> +   After a successful rewind and subsequent WAL replay, the target data
> +   directory is equivalent to a base backup of the source data directory.
> While
> +   only changed blocks from existing relation files are copied; all other
> files
> +   are copied in full, including new relation files, configuration files,
> and WAL
> +   segments. The advantage of pg_rewind over
> taking a
>
> The first sentence you are adding refers to "subsequent WAL replay".
> However, this paragraph emphasizes with the state of the target
> cluster after running pg_rewind but *before* make the target cluster
> start recovery.  So shouldn't you just remove the part "and subsequent
> WAL replay" from your first new sentence?
>

I'd originally typed this:
I'm not sure I follow. After pg_rewind but before replay the directory is
*not* equivalent to a base backup. I don't see how paragraph is clearly
limited to describing what pg_rewind does. While the 2nd sentence is about
pg_rewind steps specifically, the paragraph (even in the original) goes on
to compare it to a base backup so we're talking about the operation in
totality not just the one tool.

But I realized while typing it that I was probably missing something of
what you were getting at: is the hangup on calling out the WAL replay that
a base backup (or rsync even) *also* requires WAL reply to reach a
consistent state? I hadn't thought of that while writing this initially, so
I've updated the patch to eliminate that part but also to make the analogy
to base backups more direct, since it's helpful in understanding what
result the tool is trying to accomplish and how it differs.

In the same paragraph, I think that you should remove the "While" from
> "While only changed blocks", as the second part of the sentence refers
> to the other files, WAl segments, etc.
>

Fixed as part of the above.


> The second paragraph of the docs regarding timeline lookup is
> unchanged, which is fine.
>
> -   When the target server is started for the first time after running
> -   pg_rewind, it will go into recovery mode
> and replay all
> -   WAL generated in the source server after the point of divergence.
> +   After running pg_rewind the data directory
> is
> +   not immediately in a consistent state. However
> +   pg_rewind configures the control file so
> that when
> +   the target server is started again it will enter recovery mode and
> replay all
> +   WAL generated in the source server after the point of divergence.
>
> The second part of the third paragraph is not changed, and the
> modification you are doing here is about the control file.  I am
> still unconvinced that this is a good change, because mentioning the
> control file would be actually more adapted to the part "How it
> works", where you are adding details about the backup_label file, and
> already include details about the minimum consistency LSN itself
> stored in the control file.
>

I've removed the control file reference and instead continued the analogy
to base backups.


> +   
> +Because pg_rewind copies configuration
> files
> +entirely from the source, correcting recovery configuration options
> before
> +restarting the server is necessary if you intend to re-introduce the
> target
> +as a replica of the source. If you restart the server after the rewind
> +operation has finished but without configuring recovery, the target
> will
> +again diverge from the primary.
> +   
>
> True that this is not outlined enough.
>

Thanks.


> +  The relation files are now to their state at the last checkpoint
> completed
> +  prior to the point at which the WAL timelines of the source and
> target
> +  diverged plus the current state on the source of any blocks changed
> on the
> +  target after that divergence.
>
> "Relation files are now in a state equivalent to the moment of the
> last completed checkpoint prior to the point.."?
>

Updated.


> -  pg_s

Re: pg_rewind docs correction

2020-03-09 Thread Michael Paquier

On Sun, Mar 08, 2020 at 05:13:21PM -0400, James Coleman wrote:
> I've added the information about how the backup label control file is
> written, and updated the How It Works steps to refer to that separately
> from restart.
> 
> Additionally the How It Works is updated to include WAL segments and new
> relation files in the list of files copied wholesale, since that was
> previously stated but somewhat contradicted there.

-   The result is equivalent to replacing the target data directory with the
-   source one. Only changed blocks from relation files are copied;
-   all other files are copied in full, including configuration files. The
-   advantage of pg_rewind over taking a new base 
backup, or
-   tools like rsync, is that 
pg_rewind does
-   not require reading through unchanged blocks in the cluster. This makes
-   it a lot faster when the database is large and only a small
-   fraction of blocks differ between the clusters.
+   After a successful rewind and subsequent WAL replay, the target data
+   directory is equivalent to a base backup of the source data directory. While
+   only changed blocks from existing relation files are copied; all other files
+   are copied in full, including new relation files, configuration files, and 
WAL
+   segments. The advantage of pg_rewind over taking 
a

The first sentence you are adding refers to "subsequent WAL replay".
However, this paragraph emphasizes with the state of the target
cluster after running pg_rewind but *before* make the target cluster
start recovery.  So shouldn't you just remove the part "and subsequent
WAL replay" from your first new sentence?

In the same paragraph, I think that you should remove the "While" from
"While only changed blocks", as the second part of the sentence refers
to the other files, WAl segments, etc.

The second paragraph of the docs regarding timeline lookup is
unchanged, which is fine.

-   When the target server is started for the first time after running
-   pg_rewind, it will go into recovery mode and 
replay all
-   WAL generated in the source server after the point of divergence.
+   After running pg_rewind the data directory is
+   not immediately in a consistent state. However
+   pg_rewind configures the control file so that 
when
+   the target server is started again it will enter recovery mode and replay 
all
+   WAL generated in the source server after the point of divergence.

The second part of the third paragraph is not changed, and the
modification you are doing here is about the control file.  I am
still unconvinced that this is a good change, because mentioning the
control file would be actually more adapted to the part "How it
works", where you are adding details about the backup_label file, and
already include details about the minimum consistency LSN itself
stored in the control file.

+   
+Because pg_rewind copies configuration files
+entirely from the source, correcting recovery configuration options before
+restarting the server is necessary if you intend to re-introduce the target
+as a replica of the source. If you restart the server after the rewind
+operation has finished but without configuring recovery, the target will
+again diverge from the primary.
+   

True that this is not outlined enough.

+  The relation files are now to their state at the last checkpoint 
completed
+  prior to the point at which the WAL timelines of the source and target
+  diverged plus the current state on the source of any blocks changed on 
the
+  target after that divergence.

"Relation files are now in a state equivalent to the moment of the
last completed checkpoint prior to the point.."?

-  pg_stat_tmp/, and
-  pg_subtrans/ are omitted from the data copied
-  from the source cluster. Any file or directory beginning with
-  pgsql_tmp is omitted, as well as are
+  pg_stat_tmp/, and pg_subtrans/
+  are omitted from the data copied from the source cluster. The files

This is just reorganizing an existing list, why?

+  Create a backup label file to begin WAL replay at the checkpoint created
+  at failover and  a minimum consistency LSN using
+  pg_current_wal_insert_lsn(), when using a live source
+  and the last checkpoint LSN, when using a stopped source.

Now would be the moment to mention the control file.

> I realized I didn't previously add this to the CF; since it's not a new
> patch I've added it to the current CF, but if this is incorrect please let
> me know.

The last CF of Postgres 13 began at the beginning of February :(
--
Michael


signature.asc
Description: PGP signature

Re: pg_rewind docs correction

2020-03-08 Thread James Coleman

On Sun, Mar 8, 2020 at 5:13 PM James Coleman  wrote:

>
> I realized I didn't previously add this to the CF; since it's not a new
> patch I've added it to the current CF, but if this is incorrect please let
> me know.
>

 Hmm, looks like I can't add it to the current one. I added it to the next
one. I think it could probably go now, since the patch is really 6 months
old, but either way is fine -- it's just a docs patch.

James

Re: pg_rewind docs correction

2020-03-08 Thread James Coleman

On Tue, Sep 17, 2019 at 9:41 PM Michael Paquier  wrote:

> On Tue, Sep 17, 2019 at 08:38:18AM -0400, James Coleman wrote:
> > I don't agree that that's a valid equivalency. I myself spent a lot of
> > time trying to understand how this could possibly be true a while
> > back, and even looked at source code to be certain. I've asked other
> > people and found the same confusion.
> >
> > As I read it the 2nd second sentence doesn't actually tell you the
> > differences; it makes a quick attempt at summarizing *how* the first
> > sentence is true, but if the first sentence isn't accurate, then it's
> > hard to read the 2nd one as helping.
>
> Well, then it comes back to the part where I am used to the existing
> docs :)
>
> > If you'd prefer something less detailed at this point at that point in
> > the docs, then something along the lines of "results in a data
> > directory state which can then be safely replayed from the source" or
> > some such.
>
> Actually this is a good suggestion, and could replace the first
> sentence of this paragraph.
>
> > The docs shouldn't be correct just for someone how already understands
> > the intricacies. And the end user shouldn't have to read the "how it
> > works" (which incidentally is kinda hidden at the bottom underneath
> > the CLI args -- perhaps we could move that?) to extrapolate things in
> > the primary documentation.
>
> Perhaps.  This doc page is not that long either.
>

I'd set this aside for quite a while, but I was looking at it again this
afternoon, and I've come to see your concern about the opening paragraphs
remaining relatively simple. To that end I believe I've come up with a
patch that's a good compromise: retaining that simplicity and being more
clear and accurate at the same time.

In the first paragraph I've updated it to refer to both "successful rewind
and subsequent WAL replay" and the result I describe as being equivalent to
the result of a base backup, since that's more technically correct anyway
(the current text could be read as implying a full out copy of the data
directory, but that's not really true just as it isn't with pg_basebackup).

I've added the information about how the backup label control file is
written, and updated the How It Works steps to refer to that separately
from restart.

Additionally the How It Works is updated to include WAL segments and new
relation files in the list of files copied wholesale, since that was
previously stated but somewhat contradicted there.

I realized I didn't previously add this to the CF; since it's not a new
patch I've added it to the current CF, but if this is incorrect please let
me know.

Thanks,
James
From 592ba15c35bb16e55b0bb0a7e7bdbb6dd4e08a0b Mon Sep 17 00:00:00 2001
From: James Coleman 
Date: Sun, 8 Mar 2020 16:39:45 -0400
Subject: [PATCH v3] Improve pg_rewind explanation and warnings

The pg_rewind docs currently assert that the state of the target's
data directory after rewind is equivalent to the source's data
directory. But that isn't quite true both because the base state is
further back in time and because the target's data directory will
include the current state on the source of any copied blocks.
Additionally the state isn't equal to a copy of the source data
directory; it's equivalent to a base backup of the source.

The How It Works section now:
- Includes details about how the backup label file is created.
- Is updated to include WAL segments and new relation files in the
  list of files copied wholesale from the source.

Finally, document clearly the state of the cluster after the operation
and also the operation sequencing dangers caused by copying
configuration files from the source.
---
 doc/src/sgml/ref/pg_rewind.sgml | 87 -
 1 file changed, 54 insertions(+), 33 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 42d29edd4e..bc6f0009cc 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -48,14 +48,16 @@ PostgreSQL documentation

-   The result is equivalent to replacing the target data directory with the
-   source one. Only changed blocks from relation files are copied;
-   all other files are copied in full, including configuration files. The
-   advantage of pg_rewind over taking a new base backup, or
-   tools like rsync, is that pg_rewind does
-   not require reading through unchanged blocks in the cluster. This makes
-   it a lot faster when the database is large and only a small
-   fraction of blocks differ between the clusters.
+   After a successful rewind and subsequent WAL replay, the target data
+   directory is equivalent to a base backup of the source data directory. While
+   only changed blocks from existing relation files are copied; all other files
+   are copied in full, including new relation files, configuration files, and WAL
+   segments. The advantage of pg_rewind over taking a
+   new base backup, or tools like rsync, is t

Re: pg_rewind docs correction

2019-09-17 Thread Michael Paquier

On Tue, Sep 17, 2019 at 08:38:18AM -0400, James Coleman wrote:
> I don't agree that that's a valid equivalency. I myself spent a lot of
> time trying to understand how this could possibly be true a while
> back, and even looked at source code to be certain. I've asked other
> people and found the same confusion.
> 
> As I read it the 2nd second sentence doesn't actually tell you the
> differences; it makes a quick attempt at summarizing *how* the first
> sentence is true, but if the first sentence isn't accurate, then it's
> hard to read the 2nd one as helping.

Well, then it comes back to the part where I am used to the existing
docs :)

> If you'd prefer something less detailed at this point at that point in
> the docs, then something along the lines of "results in a data
> directory state which can then be safely replayed from the source" or
> some such.

Actually this is a good suggestion, and could replace the first
sentence of this paragraph.

> The docs shouldn't be correct just for someone how already understands
> the intricacies. And the end user shouldn't have to read the "how it
> works" (which incidentally is kinda hidden at the bottom underneath
> the CLI args -- perhaps we could move that?) to extrapolate things in
> the primary documentation.

Perhaps.  This doc page is not that long either.
--
Michael


signature.asc
Description: PGP signature

Re: pg_rewind docs correction

2019-09-17 Thread James Coleman

On Tue, Sep 17, 2019 at 3:51 AM Michael Paquier  wrote:
>
> On Sun, Sep 15, 2019 at 10:36:04AM -0400, James Coleman wrote:
> > On Sun, Sep 15, 2019 at 10:25 AM Michael Paquier  
> > wrote:
> >> +The rewind operation is not expected to result in a consistent data
> >> +directory state either internally to the node or with respect to the 
> >> rest
> >> +of the cluster. Instead the resulting data directory will only be 
> >> consistent
> >> +after WAL replay has completed to at least the LSN at which changed 
> >> blocks
> >> +copied from the source were originally written on the source.
> >>
> >> That's not necessarily true.  pg_rewind enforces in the control file
> >> of the target the minimum consistency LSN to be
> >> pg_current_wal_insert_lsn() when using a live source or the last
> >> checkpoint LSN for a stopped source, so while that sounds true from
> >> the point of view of all the blocks copied, the control file may still
> >> cause a complain that the target recovering has not reached its
> >> consistent point even if all the blocks are already at a position
> >> not-so-far from what has been registered in the control file.
> >
> > I could just say "after WAL replay has completed to a consistent state"?
>
> I still would not change this paragraph.  The first sentence means
> that we have an equivalency, because that's the case if you think
> about it as we make sure that the target is able to sync with the
> source, and the target gets into a state where it as an on-disk state
> equivalent to the target up to the minimum consistency point defined
> in the control file once the tool has done its work (this last point
> is too precise to be included in a global description to be honest).
> And the second sentence makes clear what are the actual diffs are.
> >> +   the point at which the WAL timelines of the source and target diverged 
> >> plus
> >> +   the current state on the source of any blocks changed on the target 
> >> after
> >> +   that divergence. While only changed blocks from existing relation 
> >> files are
> >>
> >> And here we could mention that all the blocks copied from the source
> >> are the ones which are found in the WAL records of the target until
> >> the end of WAL of its timeline.  Still, that's basically what is
> >> mentioned in the first part of "How It Works", which explains things
> >> better.  I honestly don't really see that all this paragraph is an
> >> improvement over the simplicity of the original when it comes to
> >> understand the global idea of what pg_rewind does.
> >
> > The problem with the original is that while simple, it's actually
> > incorrect in that simplicity. Pg_rewind does *not* result in the data
> > directory on the target matching the data directory on the source.
>
> That's not what I get from the original docs, but I may be too much
> used to it.

I don't agree that that's a valid equivalency. I myself spent a lot of
time trying to understand how this could possibly be true a while
back, and even looked at source code to be certain. I've asked other
people and found the same confusion.

As I read it the 2nd second sentence doesn't actually tell you the
differences; it makes a quick attempt at summarizing *how* the first
sentence is true, but if the first sentence isn't accurate, then it's
hard to read the 2nd one as helping.

If you'd prefer something less detailed at this point at that point in
the docs, then something along the lines of "results in a data
directory state which can then be safely replayed from the source" or
some such.

The docs shouldn't be correct just for someone how already understands
the intricacies. And the end  user shouldn't have to read the "how it
works" (which incidentally is kinda hidden at the bottom underneath
the CLI args -- perhaps we could move that?) to extrapolate things in
the primary documentation.

James Coleman

Re: pg_rewind docs correction

2019-09-17 Thread Michael Paquier

On Sun, Sep 15, 2019 at 10:36:04AM -0400, James Coleman wrote:
> On Sun, Sep 15, 2019 at 10:25 AM Michael Paquier  wrote:
>> +The rewind operation is not expected to result in a consistent data
>> +directory state either internally to the node or with respect to the 
>> rest
>> +of the cluster. Instead the resulting data directory will only be 
>> consistent
>> +after WAL replay has completed to at least the LSN at which changed 
>> blocks
>> +copied from the source were originally written on the source.
>>
>> That's not necessarily true.  pg_rewind enforces in the control file
>> of the target the minimum consistency LSN to be
>> pg_current_wal_insert_lsn() when using a live source or the last
>> checkpoint LSN for a stopped source, so while that sounds true from
>> the point of view of all the blocks copied, the control file may still
>> cause a complain that the target recovering has not reached its
>> consistent point even if all the blocks are already at a position
>> not-so-far from what has been registered in the control file.
> 
> I could just say "after WAL replay has completed to a consistent state"?

I still would not change this paragraph.  The first sentence means
that we have an equivalency, because that's the case if you think
about it as we make sure that the target is able to sync with the
source, and the target gets into a state where it as an on-disk state
equivalent to the target up to the minimum consistency point defined
in the control file once the tool has done its work (this last point
is too precise to be included in a global description to be honest).
And the second sentence makes clear what are the actual diffs are.

>> +   the point at which the WAL timelines of the source and target diverged 
>> plus
>> +   the current state on the source of any blocks changed on the target after
>> +   that divergence. While only changed blocks from existing relation files 
>> are
>>
>> And here we could mention that all the blocks copied from the source
>> are the ones which are found in the WAL records of the target until
>> the end of WAL of its timeline.  Still, that's basically what is
>> mentioned in the first part of "How It Works", which explains things
>> better.  I honestly don't really see that all this paragraph is an
>> improvement over the simplicity of the original when it comes to
>> understand the global idea of what pg_rewind does.
> 
> The problem with the original is that while simple, it's actually
> incorrect in that simplicity. Pg_rewind does *not* result in the data
> directory on the target matching the data directory on the source.

That's not what I get from the original docs, but I may be too much
used to it.

>> +  
>> +Because pg_rewind copies configuration files
>> +entirely from the source, correcting recovery configuration options 
>> before
>> +restarting the server is necessary if you intend to re-introduce the 
>> target
>> +as a replica of the source. If you restart the server after the rewind
>> +operation has finished but without configuring recovery, the target will
>> +again diverge from the primary.
>> +   
>>
>> No objections regarding that part.  Now it seems to me that we had
>> better apply that to the last part of "How it works" instead?  I kind
>> of agree that the last paragraph could provide more details regarding
>> the risks of overwriting the wanted configuration.  The existing docs
>> also mention that pg_rewind only creates a backup_label file to start
>> recovery, perhaps we could mention up to which point recovery happens
>> in this section?  There is a bit more here than just "apply the WAL".
> 
> I'll look to see if there's a better place to put this.

Thanks.  From what I can see, we could improve further the doc part
about how the tool works in details, especially regarding the
configuration files which may get overwritten and be more precise
about that.
--
Michael


signature.asc
Description: PGP signature

Re: pg_rewind docs correction

2019-09-15 Thread James Coleman

On Sun, Sep 15, 2019 at 10:25 AM Michael Paquier  wrote:
>
> On Sat, Sep 14, 2019 at 07:00:54PM -0400, James Coleman wrote:
> > Updated (plus some additional wordsmithing).
>
> +The rewind operation is not expected to result in a consistent data
> +directory state either internally to the node or with respect to the rest
> +of the cluster. Instead the resulting data directory will only be 
> consistent
> +after WAL replay has completed to at least the LSN at which changed 
> blocks
> +copied from the source were originally written on the source.
>
> That's not necessarily true.  pg_rewind enforces in the control file
> of the target the minimum consistency LSN to be
> pg_current_wal_insert_lsn() when using a live source or the last
> checkpoint LSN for a stopped source, so while that sounds true from
> the point of view of all the blocks copied, the control file may still
> cause a complain that the target recovering has not reached its
> consistent point even if all the blocks are already at a position
> not-so-far from what has been registered in the control file.

I could just say "after WAL replay has completed to a consistent state"?

> +   the point at which the WAL timelines of the source and target diverged 
> plus
> +   the current state on the source of any blocks changed on the target after
> +   that divergence. While only changed blocks from existing relation files 
> are
>
> And here we could mention that all the blocks copied from the source
> are the ones which are found in the WAL records of the target until
> the end of WAL of its timeline.  Still, that's basically what is
> mentioned in the first part of "How It Works", which explains things
> better.  I honestly don't really see that all this paragraph is an
> improvement over the simplicity of the original when it comes to
> understand the global idea of what pg_rewind does.

The problem with the original is that while simple, it's actually
incorrect in that simplicity. Pg_rewind does *not* result in the data
directory on the target matching the data directory on the source.

> +  
> +Because pg_rewind copies configuration files
> +entirely from the source, correcting recovery configuration options 
> before
> +restarting the server is necessary if you intend to re-introduce the 
> target
> +as a replica of the source. If you restart the server after the rewind
> +operation has finished but without configuring recovery, the target will
> +again diverge from the primary.
> +   
>
> No objections regarding that part.  Now it seems to me that we had
> better apply that to the last part of "How it works" instead?  I kind
> of agree that the last paragraph could provide more details regarding
> the risks of overwriting the wanted configuration.  The existing docs
> also mention that pg_rewind only creates a backup_label file to start
> recovery, perhaps we could mention up to which point recovery happens
> in this section?  There is a bit more here than just "apply the WAL".

I'll look to see if there's a better place to put this.

James Coleman

Re: pg_rewind docs correction

2019-09-15 Thread Michael Paquier

On Sat, Sep 14, 2019 at 07:00:54PM -0400, James Coleman wrote:
> Updated (plus some additional wordsmithing).

+The rewind operation is not expected to result in a consistent data
+directory state either internally to the node or with respect to the rest
+of the cluster. Instead the resulting data directory will only be 
consistent
+after WAL replay has completed to at least the LSN at which changed blocks
+copied from the source were originally written on the source.

That's not necessarily true.  pg_rewind enforces in the control file
of the target the minimum consistency LSN to be
pg_current_wal_insert_lsn() when using a live source or the last
checkpoint LSN for a stopped source, so while that sounds true from
the point of view of all the blocks copied, the control file may still
cause a complain that the target recovering has not reached its
consistent point even if all the blocks are already at a position
not-so-far from what has been registered in the control file.

+   the point at which the WAL timelines of the source and target diverged plus
+   the current state on the source of any blocks changed on the target after
+   that divergence. While only changed blocks from existing relation files are

And here we could mention that all the blocks copied from the source
are the ones which are found in the WAL records of the target until
the end of WAL of its timeline.  Still, that's basically what is
mentioned in the first part of "How It Works", which explains things
better.  I honestly don't really see that all this paragraph is an
improvement over the simplicity of the original when it comes to
understand the global idea of what pg_rewind does.

+  
+Because pg_rewind copies configuration files
+entirely from the source, correcting recovery configuration options before
+restarting the server is necessary if you intend to re-introduce the target
+as a replica of the source. If you restart the server after the rewind
+operation has finished but without configuring recovery, the target will
+again diverge from the primary.
+   

No objections regarding that part.  Now it seems to me that we had
better apply that to the last part of "How it works" instead?  I kind
of agree that the last paragraph could provide more details regarding
the risks of overwriting the wanted configuration.  The existing docs
also mention that pg_rewind only creates a backup_label file to start
recovery, perhaps we could mention up to which point recovery happens
in this section?  There is a bit more here than just "apply the WAL".
--
Michael


signature.asc
Description: PGP signature

Re: pg_rewind docs correction

2019-09-14 Thread James Coleman

On Sat, Sep 14, 2019 at 12:20 AM Michael Paquier  wrote:
>
> On Fri, Sep 13, 2019 at 01:47:03PM -0400, James Coleman wrote:
> > So I've attached a patch to summarize more correctly as well as
> > document clearly the state of the cluster after the operation and also
> > the operation sequencing dangers caused by copying configuration
> > files from the source.
>
> +   After a successful rewind, the target data directory is equivalent
> to the
> +   to the state of the data directory at the point at which the
> source and
> +   target diverged plus the current state on the source of any blocks
> changed
> +   on the target after that divergence. While only changed blocks
> from relation
> +   files are copied; all other files are copied in full, including
> configuration
> +   files and WAL segments. The advantage of
> pg_rewind
> +   over taking a new base backup, or tools like
> rsync,
> +   is that pg_rewind does not require
> comparing or
> +   copying unchanged relation blocks in the cluster. As such the
> rewind operation
> +   is significantly faster than other approaches when the database is
> large and
> +   only a small fraction of blocks differ between the clusters.
>
> The point of divergence could be defined as the LSN position where WAL
> has forked on the new timeline, but the block diffs are copied from
> actually the last checkpoint just before WAL has forked.  So this new
> paragraph brings confusion about the actual divergence point.
>
> Regarding the relation files, if the file does not exist on the target
> but does exist on the source, it is also copied fully, so the second
> sentence is wrong here to mention as relation files could also be
> copied fully.

Updated (plus some additional wordsmithing).

James Coleman


001_pg_rewind_explanation_doc_fixes_v2.patch
Description: Binary data

Re: pg_rewind docs correction

2019-09-13 Thread Michael Paquier

On Fri, Sep 13, 2019 at 01:47:03PM -0400, James Coleman wrote:
> So I've attached a patch to summarize more correctly as well as
> document clearly the state of the cluster after the operation and also
> the operation sequencing dangers caused by copying configuration
> files from the source.

+   After a successful rewind, the target data directory is equivalent
to the
+   to the state of the data directory at the point at which the
source and
+   target diverged plus the current state on the source of any blocks
changed
+   on the target after that divergence. While only changed blocks
from relation
+   files are copied; all other files are copied in full, including
configuration
+   files and WAL segments. The advantage of
pg_rewind
+   over taking a new base backup, or tools like
rsync,
+   is that pg_rewind does not require
comparing or
+   copying unchanged relation blocks in the cluster. As such the
rewind operation
+   is significantly faster than other approaches when the database is
large and
+   only a small fraction of blocks differ between the clusters.

The point of divergence could be defined as the LSN position where WAL
has forked on the new timeline, but the block diffs are copied from
actually the last checkpoint just before WAL has forked.  So this new
paragraph brings confusion about the actual divergence point.

Regarding the relation files, if the file does not exist on the target
but does exist on the source, it is also copied fully, so the second
sentence is wrong here to mention as relation files could also be
copied fully.
--
Michael


signature.asc
Description: PGP signature

pg_rewind docs correction

2019-09-13 Thread James Coleman

The pg_rewind docs assert that the state of the target's data directory
after rewind is equivalent to the source's data directory. But that
isn't true both because the base state is further back in time and
because the target's data directory will include the current state on
the source of any copied blocks.

So I've attached a patch to summarize more correctly as well as
document clearly the state of the cluster after the operation and also
the operation sequencing dangers caused by copying configuration
files from the source.

James Coleman


001_pg_rewind_explanation_doc_fixes.patch
Description: Binary data

Re: pg_rewind docs correction

Re: pg_rewind docs correction

Re: pg_rewind docs correction

Re: pg_rewind docs correction

Re: pg_rewind docs correction

Re: pg_rewind docs correction

Re: pg_rewind docs correction

Re: pg_rewind docs correction

Re: pg_rewind docs correction

Re: pg_rewind docs correction

Re: pg_rewind docs correction

Re: pg_rewind docs correction

Re: pg_rewind docs correction

Re: pg_rewind docs correction

Re: pg_rewind docs correction

Re: pg_rewind docs correction

pg_rewind docs correction

17 matches

Site Navigation

Mail list logo

Footer information