Re: Offline enabling/disabling of data checksums

2019-03-29 Thread Michael Paquier
On Tue, Mar 26, 2019 at 01:41:38PM +0100, Fabien COELHO wrote:
>> I am not sure that "checksum status" is a correct term.  It seems to
>> me that "same configuration for data checksums as before the tool ran"
>> or something like that would be more correct.
> 
> Possibly, I cannot say.

I have put more thoughts into this part, and committed the
reorganization as you mainly suggested.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-26 Thread Fabien COELHO


Bonjour Michaël,


Here is an attempt at improving the Notes. [...]


So, the ordering of the notes for each paragraph is as follows: 1) 
Replication issues when mixing different checksum setups across nodes. 
2) Consistency of the operations if killed. 3) Don't start Postgres 
while the operation runs.


Your proposal is to switch the order of the paragraphs to 3), 1) and
then 2).


Yes. I suggest to emphasize cluster corruption risks by putting them 
first.



Do others have any opinion?  I am fine with the current
order of things, still it may make sense to tweaks the docs.

In the paragraph related to replication, the second statement is
switched to be first so as the docs warn first, and then give
recommendations.


Yep.


This part makes sense.


Yep!


I am not sure that "checksum status" is a correct term.  It seems to
me that "same configuration for data checksums as before the tool ran"
or something like that would be more correct.


Possibly, I cannot say.

--
Fabien.

Re: Offline enabling/disabling of data checksums

2019-03-26 Thread Michael Paquier
On Sat, Mar 23, 2019 at 02:14:02PM +0100, Fabien COELHO wrote:
> Here is an attempt at improving the Notes.
> 
> Mostly it is a reordering from more important (cluster corruption) to less
> important (if interrupted a restart is needed), some reordering from problem
> to solutions instead of solution/problem/solution, some sentence
> simplification.

So, the ordering of the notes for each paragraph is as follows:
1) Replication issues when mixing different checksum setups across
nodes.
2) Consistency of the operations if killed.
3) Don't start Postgres while the operation runs.

Your proposal is to switch the order of the paragraphs to 3), 1) and
then 2).  Do others have any opinion?  I am fine with the current
order of things, still it may make sense to tweaks the docs.

In the paragraph related to replication, the second statement is
switched to be first so as the docs warn first, and then give
recommendations.  This part makes sense.

I am not sure that "checksum status" is a correct term.  It seems to
me that "same configuration for data checksums as before the tool ran"
or something like that would be more correct.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-23 Thread Fabien COELHO


Bonjour Michaël,

Here is an attempt at improving the Notes.

Mostly it is a reordering from more important (cluster corruption) to less 
important (if interrupted a restart is needed), some reordering from 
problem to solutions instead of solution/problem/solution, some sentence 
simplification.


--
Fabien.diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 1f4d4ab8b4..869d742aae 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -179,29 +179,27 @@ PostgreSQL documentation
  
   Notes
   
-   When disabling or enabling checksums in a replication setup of multiple
-   clusters, it is recommended to stop all the clusters before doing
-   the switch to all the clusters consistently. When using a replication
-   setup with tools which perform direct copies of relation file blocks
-   (for example ), enabling or disabling
-   checksums can lead to page corruptions in the shape of incorrect
-   checksums if the operation is not done consistently across all nodes.
-   Destroying all the standbys in the setup first, enabling or disabling
-   checksums on the primary and finally recreating the standbys from
-   scratch is also safe.
+   Enabling checksums in a large cluster can potentially take a long time.
+   During this operation, the cluster or other programs that write to the
+   data directory must not be started or else data loss may occur.
   
   
-   If pg_checksums is aborted or killed in
-   its operation while enabling or disabling checksums, the cluster
-   will have the same state with respect of checksums as before the
-   operation and pg_checksums needs to be
-   restarted.
+   When using a replication setup with tools which perform direct copies
+   of relation file blocks (for example ),
+   enabling or disabling checksums can lead to page corruptions in the
+   shape of incorrect checksums if the operation is not done consistently
+   across all nodes.
+   For enabling or disabling checksums in a replication setup,
+   it is thus recommended to stop all the clusters before switching
+   them all consistently.
+   Destroying all standbys, performing the operation on the primary and
+   finally recreating the standbys from scratch is also safe.
   
   
-   When enabling checksums in a cluster, the operation can potentially
-   take a long time if the data directory is large. During this operation,
-   the cluster or other programs that write to the data directory must not
-   be started or else data loss may occur.
-   
+   If pg_checksums is aborted or killed
+   while enabling or disabling checksums, the cluster will have the
+   same checksum status as before the operation and
+   pg_checksums needs to be restarted.
+  
  
 


Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Michael Paquier
On Fri, Mar 22, 2019 at 03:18:26PM +0100, Fabien COELHO wrote:
> Attached is a quick patch about "pg_rewind", so that the control file is
> updated after everything else is committed to disk.

Could you start a new thread about that please?  This one has already
been used for too many things.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Michael Paquier
On Sat, Mar 23, 2019 at 08:16:07AM +0900, Michael Paquier wrote:
> And committed the main part.  I'll look after the --no-sync part in a
> bit.

--no-sync is committed as well now.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Michael Paquier
On Fri, Mar 22, 2019 at 07:02:36PM +0100, Fabien COELHO wrote:
> Indeed it does, and it is done in update_controlfile if the last argument is
> true. Basically update_controlfile latest version always fsync the control
> file, unless explicitely told not to do so. The options to do that are
> really there only to speed up non regression tests.

For the control file, it would not really matter much, and the cost
would be really coming from syncing the data directory, still for
correctness it is better to have a full all-or-nothing switch.  Small
buildfarm machines also like the --no-sync flavors a lot.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Michael Paquier
On Fri, Mar 22, 2019 at 02:59:31PM +0100, Fabien COELHO wrote:
> On write(), the error message is not translatable whereas it is for all
> others.

Fixed.

> I agree that a BIG STRONG warning is needed about not to start the cluster
> under pain of possible data corruption. I still think that preventing this
> is desirable, preferably before v12.

For now the docs mention that in a paragraph as Michael Banck has
suggested.  Not sure that this deserves a warning portion.

> Otherwise, my remaining non showstopper (committer's opinion matters more)
> issues:
> 
> Doc: A postgres cluster is like an Oracle instance. I'd use "replicated
> setup" instead of "cluster", and "cluster" instead of "instance. I'll try to
> improve the text, possibly over the week-end.

Right.  I have reworded that using your suggestions.

And committed the main part.  I'll look after the --no-sync part in a
bit.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Fabien COELHO



Hello Christoph,


-   pg_log(PG_PROGRESS, "syncing target data directory\n");
-   syncTargetDirectory();


Doesn't the control file still need syncing?


Indeed it does, and it is done in update_controlfile if the last argument 
is true. Basically update_controlfile latest version always fsync the 
control file, unless explicitely told not to do so. The options to do that 
are really there only to speed up non regression tests.


--
Fabien.



Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Christoph Berg
Re: Fabien COELHO 2019-03-22 
> Attached is a quick patch about "pg_rewind", so that the control file is
> updated after everything else is committed to disk.

>   update_controlfile(datadir_target, progname, _new, do_sync);
>  
> - pg_log(PG_PROGRESS, "syncing target data directory\n");
> - syncTargetDirectory();

Doesn't the control file still need syncing?

Christoph



Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Fabien COELHO



Done the switch for this case.  For pg_rewind actually I think that this 
is an area where its logic could be improved a bit. So first the data 
folder is synced, and then the control file is updated.


Attached is a quick patch about "pg_rewind", so that the control file is 
updated after everything else is committed to disk.


--
Fabien.diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index 3dcadb9b40..c1e6d7cd07 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -351,9 +351,12 @@ main(int argc, char **argv)
 
 	progress_report(true);
 
-	pg_log(PG_PROGRESS, "\ncreating backup label and updating control file\n");
+	pg_log(PG_PROGRESS, "\ncreating backup label\n");
 	createBackupLabel(chkptredo, chkpttli, chkptrec);
 
+	pg_log(PG_PROGRESS, "syncing target data directory\n");
+	syncTargetDirectory();
+
 	/*
 	 * Update control file of target. Make it ready to perform archive
 	 * recovery when restarting.
@@ -362,6 +365,7 @@ main(int argc, char **argv)
 	 * source server. Like in an online backup, it's important that we recover
 	 * all the WAL that was generated while we copied the files over.
 	 */
+	pg_log(PG_PROGRESS, "updating control file\n");
 	memcpy(_new, _source, sizeof(ControlFileData));
 
 	if (connstr_source)
@@ -377,11 +381,9 @@ main(int argc, char **argv)
 	ControlFile_new.minRecoveryPoint = endrec;
 	ControlFile_new.minRecoveryPointTLI = endtli;
 	ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY;
+
 	update_controlfile(datadir_target, progname, _new, do_sync);
 
-	pg_log(PG_PROGRESS, "syncing target data directory\n");
-	syncTargetDirectory();
-
 	printf(_("Done!\n"));
 
 	return 0;


Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Fabien COELHO


Bonjour Michaël,


Does that look fine to you?


Mostly.

Patch v9 part 1 applies cleanly, compiles, global and local check ok, doc 
build ok.


On write(), the error message is not translatable whereas it is for all 
others.


I agree that a BIG STRONG warning is needed about not to start the cluster 
under pain of possible data corruption. I still think that preventing this 
is desirable, preferably before v12.



Otherwise, my remaining non showstopper (committer's opinion matters more) 
issues:


Doc: A postgres cluster is like an Oracle instance. I'd use "replicated 
setup" instead of "cluster", and "cluster" instead of "instance. I'll try 
to improve the text, possibly over the week-end.


Enabling/disabling an already enabled/disabled cluster should not be a 
failure for me, because the cluster is left under the prescribed state.


Patch v9 part 2 applies cleanly, compiles, global and local check ok, doc 
build ok.


Ok for me.

--
Fabien.

Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Michael Paquier
On Fri, Mar 22, 2019 at 10:04:02AM +0100, Michael Banck wrote:
> How about this:
> 
> + 
> +  Notes
> +  
> +   When enabling checksums in a cluster, the operation can potentially take a
> +   long time if the data directory is large.  During this operation, the 
> +   cluster or other programs that write to the data directory must not be 
> +   started or else data-loss will occur.
> +  

Sounds fine to me.  Will add an extra paragraph on that.

> Maybe it could be formulated like "If pg_checksums is aborted or killed
> in its operation while enabling or disabling checksums, the cluster
> will have the same state with respect of checksums as before the
> operation and pg_checksums needs to be restarted."?

We could use that as well.  With the current patch, and per the
suggestions from Fabien, this holds true as the control file is
updated and flushed last.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Michael Banck
Hi,

Am Freitag, den 22.03.2019, 17:37 +0900 schrieb Michael Paquier:
> On Fri, Mar 22, 2019 at 09:13:43AM +0100, Michael Banck wrote:
> > Don't we need a big warning that the cluster must not be started during
> > operation of pg_checksums as well, now that we don't disallow it?
> 
> The same applies to pg_rewind and pg_basebackup, so I would classify
> that as a pilot error.  

How would it apply to pg_basebackup? The cluster is running while the
base backup is taken and I believe the control file is written at the
end so you can't start another instance off the backup directory until
the base backup has finished.

It would apply to pg_rewind, but pg_rewind's runtime is not scaling with
cluster size, does it? pg_checksums will run for hours on large clusters
so the window of errors is much larger and I don't think you can easily
compare the two.

> How would you formulate that in the docs if you add it.

(I would try to make sure you can't start the cluster but that seems off
the table for now)

How about this:

+ 
+  Notes
+  
+   When enabling checksums in a cluster, the operation can potentially take a
+   long time if the data directory is large.  During this operation, the 
+   cluster or other programs that write to the data directory must not be 
+   started or else data-loss will occur.
+  
+
+  
+   When disabling or enabling checksums in a cluster of multiple instances,
[...]

Also, the following is not very clear to me:

+   If the event of a crash of the operating system while enabling or

s/If/In/

+   disabling checksums, the data folder may have checksums in an inconsistent
+   state, in which case it is recommended to check the state of checksums
+   in the data folder.

How is the user supposed to check the state of checksums? Do you mean
that if the user intended to enable checksums and the box dies in
between, they should check whether checksums are actually enabled and
re-run if not?  Because it could also mean running pg_checksums --check
on the cluster, which wouldn't work in that case as the control file has
not been updated yet.

Maybe it could be formulated like "If pg_checksums is aborted or killed
in its  operation while enabling or disabling checksums, the cluster
will have the same state with respect of checksums as before the
operation and pg_checksums needs to be restarted."?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Michael Paquier
On Fri, Mar 22, 2019 at 09:13:43AM +0100, Michael Banck wrote:
> Don't we need a big warning that the cluster must not be started during
> operation of pg_checksums as well, now that we don't disallow it?

The same applies to pg_rewind and pg_basebackup, so I would classify
that as a pilot error.  How would you formulate that in the docs if
you add it.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Michael Banck
Hi,

Am Freitag, den 22.03.2019, 09:27 +0900 schrieb Michael Paquier:
> I have added in the docs a warning about a host crash while doing the 
> operation, with a recommendation to check the state of the checksums
> on the data folder should it happen, and the previous portion of the
> docs about clusters.  Your suggestion sounds adapted.  I would be
> tempted to add a bigger warning in pg_rewind or pg_basebackup about
> that, but that's a different story for another time.
> 
> Does that look fine to you?

Don't we need a big warning that the cluster must not be started during
operation of pg_checksums as well, now that we don't disallow it?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Offline enabling/disabling of data checksums

2019-03-21 Thread Michael Paquier
On Thu, Mar 21, 2019 at 08:17:32AM +0100, Fabien COELHO wrote:
> I can try, but I must admit that I'm fuzzy about the actual issue. Is there
> a problem on a streaming replication with inconsistent checksum settings, or
> not?
> 
> You seem to suggest that the issue is more about how some commands or backup
> tools operate on a cluster.

Yes.  That's what I am writing about.  Imagine for example this case
with pg_rewind:
- primary has checksums enabled.
- standby has checksums disabled.
- a hard crash of the primary happens, there is a failover to the
standby which gets promoted.
- The primary's host is restarted, it is started and stopped once
cleanly to have a clear point in its past timeline where WAL forked
thanks to the generation of at least the shutdown checkpoint generated
by the clean stop.
- pg_rewind is run, copying some pages from the promoted standby,
which don't have checksums, to the primary with checksums enabled, and
causing some pages to have an incorrect checksum.

There is another tool I know of which is called pg_rman, which is a
backup tool able to take incremental backups in the shape of a delta
of relation blocks.  Then imagine the following:
- One instance of Postgres runs, has checksums disabled.
- pg_rman takes a full backup of it.
- Checksums are enabled on this instance.
- An incremental backup from the previous full backup point is taken.
If I recall correctly pg_rman takes a copy of the new control file as
well, which tracks checksums as being enabled.
- A crash happens, the data folder is dead.
- Rollback to the previous backup is done, and we restore up to a
point after the incremental backup.
- And you finish with a cluster which has checksums enabled, but as
the initial full backup had checksums disabled, not all the pages may
be in a correct state.

So I think that it makes sense to tell to be careful within the
documentation, but being too much careful in the tool discards also
many possibilities (see the example of the clean failover where it is
possible to enable checksums with no actual downtime).  And this part
has a lot of value.

> ISTM that this would not work: The control file update can only be done
> *after* the fsync to describe the cluster actual status, otherwise it is
> just a question of luck whether the cluster is corrupt on an crash while
> fsyncing. The enforced order of operation, with a barrier in between, is the
> important thing here.

Done the switch for this case.  For pg_rewind actually I think that
this is an area where its logic could be improved a bit.  So first
the data folder is synced, and then the control file is updated.  It
took less time to change the code than to write this paragraph,
including the code compilation and one run of the TAP tests,
confirmed.

I have added in the docs a warning about a host crash while doing the 
operation, with a recommendation to check the state of the checksums
on the data folder should it happen, and the previous portion of the
docs about clusters.  Your suggestion sounds adapted.  I would be
tempted to add a bigger warning in pg_rewind or pg_basebackup about
that, but that's a different story for another time.

Does that look fine to you?
--
Michael
From 7ef4a14421bd148fa24f3107e8ef8be8b348 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 22 Mar 2019 09:21:18 +0900
Subject: [PATCH v9 1/2] Add options to enable and disable checksums in
 pg_checksums

An offline cluster can now work with more modes in pg_checksums:
- --enable can enable checksums in a cluster, updating all blocks with a
correct checksum, and update the control file at the end.
- --disable can disable checksums in a cluster, updating the the control
file.
- --check is an extra option able to verify checksums for a cluster.

When running --enable or --disable, the data folder gets fsync'd for
durability.  If no mode is specified in the options, then --check is
used for compatibility with older versions of pg_verify_checksums (now
renamed to pg_checksums in v12).

Author: Michael Banck
Reviewed-by: Fabien Coelho, Michael Paquier
Discussion: https://postgr.es/m/20181221201616.gd4...@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/pg_checksums.sgml|  72 ++-
 src/bin/pg_checksums/pg_checksums.c   | 175 ++
 src/bin/pg_checksums/t/002_actions.pl |  76 ---
 src/tools/pgindent/typedefs.list  |   1 +
 4 files changed, 278 insertions(+), 46 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 6a47dda683..fda85e7ea0 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -16,7 +16,7 @@ PostgreSQL documentation
 
  
   pg_checksums
-  verify data checksums in a PostgreSQL database cluster
+  enable, disable or check data checksums in a PostgreSQL database cluster
  
 
  
@@ -36,10 +36,19 @@ PostgreSQL documentation
  
   Description
   
-   pg_checksums verifies data checksums in a
-   PostgreSQL 

Re: Offline enabling/disabling of data checksums

2019-03-21 Thread Fabien COELHO



Anyway, as this stuff is very useful for upgrade scenarios 
a-la-pg_upgrade, for backup validation and as it does not produce false 
positives, I would really like to get something committed for v12 in its 
simplest form...


Fine with me, the detailed doc is not a showstopper and can be improved 
later one.


Are there any recommendations that people would like to add to the 
documentation?


For me, I just want at least a clear warning on potential risks.

--
Fabien.



Re: Offline enabling/disabling of data checksums

2019-03-21 Thread Fabien COELHO


Bonjour Michaël,


On Wed, Mar 20, 2019 at 05:46:32PM +0100, Fabien COELHO wrote:

I think that the motivation/risks should appear before the solution. "As xyz
..., ...", or there at least the logical link should be outlined.

It is not clear for me whether the following sentences, which seems specific
to "pg_rewind", are linked to the previous advice, which seems rather to
refer to streaming replication?


Do you have a better idea of formulation?


I can try, but I must admit that I'm fuzzy about the actual issue. Is 
there a problem on a streaming replication with inconsistent checksum 
settings, or not?


You seem to suggest that the issue is more about how some commands or 
backup tools operate on a cluster.


I'll reread the thread carefully and will make a proposal.


Imagine for example a primary-standby with checksums disabled: [...]


Yep, that's cool.


Should not disabling in reverse order be safe? the checksum are not checked
afterwards?


I don't quite understand your comment about the ordering.  If all the 
standbys are destroyed first, then enabling/disabling checksums happens 
at a single place.


Sure. I was suggesting that disabling on replicated clusters is possibly 
safer, but do not know the detail of replication & checksumming with 
enough precision to be that sure about it.



After the reboot, some data files are not fully updated with their
checksums, although the controlfiles tells that they are. It should then
fail after a restart when a no-checksum page is loaded?

What am I missing?


Please note that we do that in other tools as well and we live fine
with that as pg_basebackup, pg_rewind just to name two.


The fact that other commands are exposed to the same potential risk is not 
a very good argument not to fix it.


I am not saying that it is not a problem in some cases, but I am saying 
that this is not a problem that this patch should solve.


As solving the issue involves exchanging two lines and turning one boolean 
parameter to true, I do not see why it should not be done. Fixing the 
issue takes much less time than writing about it...


And if other commands can be improved fine with me.

If we were to do something about that, it could make sense to make 
fsync_pgdata() smarter so as the control file is flushed last there, or 
define flush strategies there.


ISTM that this would not work: The control file update can only be done 
*after* the fsync to describe the cluster actual status, otherwise it is 
just a question of luck whether the cluster is corrupt on an crash while 
fsyncing. The enforced order of operation, with a barrier in between, is 
the important thing here.


--
Fabien.

Re: Offline enabling/disabling of data checksums

2019-03-21 Thread Michael Paquier
On Thu, Mar 21, 2019 at 07:59:24AM +0900, Michael Paquier wrote:
> Please note that we do that in other tools as well and we live fine
> with that as pg_basebackup, pg_rewind just to name two.  I am not
> saying that it is not a problem in some cases, but I am saying that
> this is not a problem that this patch should solve.  If we were to do
> something about that, it could make sense to make fsync_pgdata()
> smarter so as the control file is flushed last there, or define flush
> strategies there.

Anyway, as this stuff is very useful for upgrade scenarios
a-la-pg_upgrade, for backup validation and as it does not produce
false positives, I would really like to get something committed for
v12 in its simplest form...  Are there any recommendations that people
would like to add to the documentation?
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-20 Thread Michael Paquier
On Wed, Mar 20, 2019 at 05:46:32PM +0100, Fabien COELHO wrote:
> I think that the motivation/risks should appear before the solution. "As xyz
> ..., ...", or there at least the logical link should be outlined.
> 
> It is not clear for me whether the following sentences, which seems specific
> to "pg_rewind", are linked to the previous advice, which seems rather to
> refer to streaming replication?

Do you have a better idea of formulation?  If you have a failover
which makes use of pg_rewind, or use some backup tool which does
incremental copy of physical blocks like pg_rman, then you have a risk
to mess up instances in a cluster which is the risk I am trying to
outline here.  It is actually fine to do the following actually if
everything is WAL-based as checksums are only computed once a shared
buffer is flushed on a single instance.  Imagine for example a
primary-standby with checksums disabled:
- Shutdown cleanly standby, enable checksums.
- Plug back standby to cluster, let it replay up to the latest point.
- Shutdown cleanly primary.
- Promote standby.
- Enable checksums on the previous primary.
- Add recovery parameters and recommend the primary back to the
cluster.
- All nodes have checksums enabled, without rebuilding any of them.
Per what I could see on this thread, folks tend to point out that
we should *not* include such recommendations in the docs, as it is too
easy to do a pilot error.

> > +   When using a cluster with
> > +   tools which perform direct copies of relation file blocks (for example
> > +   ), enabling or disabling checksums can
> > +   lead to page corruptions in the shape of incorrect checksums if the
> > +   operation is not done consistently across all nodes.  Destroying all
> > +   the standbys in a cluster first, enabling or disabling checksums on
> > +   the primary and finally recreate the cluster nodes from scratch is
> > +   also safe.
> > +  
> > + 
> 
> Should not disabling in reverse order be safe? the checksum are not checked
> afterwards?

I don't quite understand your comment about the ordering.  If all
the standbys are destroyed first, then enabling/disabling checksums
happens at a single place.

> After the reboot, some data files are not fully updated with their
> checksums, although the controlfiles tells that they are. It should then
> fail after a restart when a no-checksum page is loaded?
> 
> What am I missing?

Please note that we do that in other tools as well and we live fine
with that as pg_basebackup, pg_rewind just to name two.  I am not
saying that it is not a problem in some cases, but I am saying that
this is not a problem that this patch should solve.  If we were to do
something about that, it could make sense to make fsync_pgdata()
smarter so as the control file is flushed last there, or define flush
strategies there.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-20 Thread Fabien COELHO


Michaël-san,


I think that a clear warning not to run any cluster command in parallel,
under pain of possible cluster corruption, and possibly other caveats about
replication, should appear in the documentation.


I still have the following extra documentation in my notes:


Ok, it should have been included in the patch.


+ 
+  Notes
+  
+   When disabling or enabling checksums in a cluster of multiple instances,


ISTM that a "postgres cluster" was like an Oracle instance:

See https://www.postgresql.org/docs/devel/creating-cluster.html

So the vocabulary used above seems misleading. I'm not sure how to name an 
Oracle cluster in postgres lingo, though.



+   it is recommended to stop all the instances of the cluster before doing
+   the switch to all the instances consistently.


I think that the motivation/risks should appear before the solution. "As 
xyz ..., ...", or there at least the logical link should be outlined.


It is not clear for me whether the following sentences, which seems 
specific to "pg_rewind", are linked to the previous advice, which seems 
rather to refer to streaming replication?



+   When using a cluster with
+   tools which perform direct copies of relation file blocks (for example
+   ), enabling or disabling checksums can
+   lead to page corruptions in the shape of incorrect checksums if the
+   operation is not done consistently across all nodes.  Destroying all
+   the standbys in a cluster first, enabling or disabling checksums on
+   the primary and finally recreate the cluster nodes from scratch is
+   also safe.
+  
+ 


Should not disabling in reverse order be safe? the checksum are not 
checked afterwards?



This sounds kind of enough for me but..


ISTM that the risks could be stated more clearly.


I also think that some mechanism should be used to prevent such occurence,
whether in this patch or another.


I won't counter-argue on that.


This answer is ambiguous.


I still think that the control file update should be made *after* the data
directory has been synced, otherwise the control file could be updated while
some data files are not. It just means exchanging two lines.


The parent path of the control file needs also to be flushed to make
the change durable, just doing things the same way pg_rewind keeps the
code simple in my opinion.


I do not understand. The issue I'm refering to is, when enabling:

 - data files are updated in fs cache
 - control file is updated in fs cache
 * fsync is called
 - updated control file gets written
 - data files are being written...
   but reboot occurs while fsyncing is still in progress

After the reboot, some data files are not fully updated with their 
checksums, although the controlfiles tells that they are. It should then 
fail after a restart when a no-checksum page is loaded?


What am I missing?

Also, I do not see how exchanging two lines makes the code less simple.


If the conditional sync is moved before the file update, the file update
should pass do_sync to update_controlfile.


For the durability of the operation, the current order is intentional.


See my point above: I think that this order can lead to cluster 
corruption. If not, please be kind enough to try to explain in more 
details why I'm wrong.


--
Fabien.

Re: Offline enabling/disabling of data checksums

2019-03-20 Thread Michael Paquier
On Wed, Mar 20, 2019 at 10:38:36AM +0100, Fabien COELHO wrote:
> Hmmm… so nothing:-)

The core of the feature is still here, fortunately.

> I think that a clear warning not to run any cluster command in parallel,
> under pain of possible cluster corruption, and possibly other caveats about
> replication, should appear in the documentation.

I still have the following extra documentation in my notes:
+ 
+  Notes
+  
+   When disabling or enabling checksums in a cluster of multiple instances,
+   it is recommended to stop all the instances of the cluster before doing
+   the switch to all the instances consistently.  When using a cluster with
+   tools which perform direct copies of relation file blocks (for example
+   ), enabling or disabling checksums can
+   lead to page corruptions in the shape of incorrect checksums if the
+   operation is not done consistently across all nodes.  Destroying all
+   the standbys in a cluster first, enabling or disabling checksums on
+   the primary and finally recreate the cluster nodes from scratch is
+   also safe.
+  
+ 
This sounds kind of enough for me but..

> I also think that some mechanism should be used to prevent such occurence,
> whether in this patch or another.

I won't counter-argue on that.

> I still think that the control file update should be made *after* the data
> directory has been synced, otherwise the control file could be updated while
> some data files are not. It just means exchanging two lines.

The parent path of the control file needs also to be flushed to make
the change durable, just doing things the same way pg_rewind keeps the
code simple in my opinion.

> If the conditional sync is moved before the file update, the file update
> should pass do_sync to update_controlfile.

For the durability of the operation, the current order is
intentional.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-20 Thread Fabien COELHO


Michaël-san,


In short, you keep the main feature with:
- No tweaks with postmaster.pid.
- Rely just on the control file indicating an instance shutdown
cleanly.
- No tweaks with the system ID.
- No renaming of the control file.


Hmmm… so nothing:-)

I think that this feature is useful, in complement to a possible 
online-enabling server-managed version.


About patch v8 part 1: applies cleanly, compiles, global & local make 
check ok, doc build ok.


I think that a clear warning not to run any cluster command in parallel, 
under pain of possible cluster corruption, and possibly other caveats 
about replication, should appear in the documentation.


I also think that some mechanism should be used to prevent such occurence, 
whether in this patch or another.


About "If enabling or disabling checksums, the exit status is nonzero if 
the operation failed." I must admit that enabling/disabling an already 
enabled/disabled cluster is still not really a failure for me, because the 
end status is that the cluster is in the state required by the user.


I still think that the control file update should be made *after* the data 
directory has been synced, otherwise the control file could be updated 
while some data files are not. It just means exchanging two lines.



About patch v8 part 2: applies cleanly, compiles, global & local make 
check ok.


The added -N option is not documented.

If the conditional sync is moved before the file update, the file update 
should pass do_sync to update_controlfile.


--
Fabien.

Re: Offline enabling/disabling of data checksums

2019-03-20 Thread Fabien COELHO



Hallo Andres,


[...]


pg_upgrade in link mode intentionally wants to *permanently* disable a
cluster. And it explicitly writes a log message about it. That's not a
case to draw inferrence for this case.


Ok. My light knowledge of pg_upgrade inner working does not extend to this 
level of precision.


I'd be okay with anything that works consistently accross all commands 
[...]


I'll admit that I'm moderately enthousiastic about "posmaster.pid" because
it does not do anymore what the file names says, but if it really works and
is used consistently by all commands, why not. In case of unexpected
problems, the file will probably have to be removed/fixed by hand. I also
think that the implemented mechanism should be made available in
"control_utils.c", not duplicated in every command.


That's just a separate feature.


Possibly, although I'm not sure what in the above is a "separate feature", 
I assume from the "pg_checksum --enable" implementation.


Is it the fact that there could (should, IMO) be some mechanisms to ensure 
that mutually exclusive direct cluster-modification commands are not run 
concurrently?


As "pg_checksums -e" is a potentially long running command, the likelyhood 
of self-inflected wounds is raised significantly: I could do absurd things 
on an enable-checksum-in-progress cluster on a previous version of the 
patch. Thus as a reviewer I'm suggesting to fix the issue.


Or is it the fact that fixing on some critical errors would possibly 
involve some manual intervention at some point?


Or is it something else?

--
Fabien.



Re: Offline enabling/disabling of data checksums

2019-03-20 Thread Andres Freund
Hi,

On 2019-03-20 07:55:39 +0100, Fabien COELHO wrote:
> > And you're basically adding it because Fabien doesn't like
> > postmaster.pid and wants to invent another lockout mechanism in this
> > thread.
> 
> I did not suggest to rename the control file, but as it is already done by
> another command it did not look like a bad idea in itself, or at least an
> already used bad idea:-)

pg_upgrade in link mode intentionally wants to *permanently* disable a
cluster. And it explicitly writes a log message about it. That's not a
case to draw inferrence for this case.


> I'd be okay with anything that works consistently accross all commands that
> may touch a cluster and are mutually exclusive (postmater, pg_rewind,
> pg_resetwal, pg_upgrade, pg_checksums…), without underlying race conditions.
> It could be locking, a control file state, a special file (which one ? what
> is the procedure to create/remove it safely and avoid potential race
> conditions ?), possibly "postmaster.pid", whatever really.
> 
> I'll admit that I'm moderately enthousiastic about "posmaster.pid" because
> it does not do anymore what the file names says, but if it really works and
> is used consistently by all commands, why not. In case of unexpected
> problems, the file will probably have to be removed/fixed by hand. I also
> think that the implemented mechanism should be made available in
> "control_utils.c", not duplicated in every command.

That's just a separate feature.

Greetings,

Andres Freund



Re: Offline enabling/disabling of data checksums

2019-03-20 Thread Fabien COELHO


Hallo Andres,


And you're basically adding it because Fabien doesn't like
postmaster.pid and wants to invent another lockout mechanism in this
thread.


I did not suggest to rename the control file, but as it is already done by 
another command it did not look like a bad idea in itself, or at least an 
already used bad idea:-)


I'd be okay with anything that works consistently accross all commands 
that may touch a cluster and are mutually exclusive (postmater, pg_rewind, 
pg_resetwal, pg_upgrade, pg_checksums…), without underlying race 
conditions. It could be locking, a control file state, a special file 
(which one ? what is the procedure to create/remove it safely and avoid 
potential race conditions ?), possibly "postmaster.pid", whatever really.


I'll admit that I'm moderately enthousiastic about "posmaster.pid" because 
it does not do anymore what the file names says, but if it really works 
and is used consistently by all commands, why not. In case of unexpected 
problems, the file will probably have to be removed/fixed by hand. I also 
think that the implemented mechanism should be made available in 
"control_utils.c", not duplicated in every command.


--
Fabien.

Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Michael Paquier
On Wed, Mar 20, 2019 at 08:09:07AM +0900, Michael Paquier wrote:
> In short, you keep the main feature with:
> - No tweaks with postmaster.pid.
> - Rely just on the control file indicating an instance shutdown
> cleanly.
> - No tweaks with the system ID.
> - No renaming of the control file.

FWIW, the simplest version is just like the attached.
--
Michael
From c71c1c8d6d5093bcea90b75cbb0c1348a823f08d Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 20 Mar 2019 14:14:33 +0900
Subject: [PATCH v8 1/2] Add options to enable and disable checksums in
 pg_checksums

An offline cluster can now work with more modes in pg_checksums:
- --enable can enable checksums in a cluster, updating all blocks with a
correct checksum, and update the control file at the end.
- --disable can disable checksums in a cluster, updating the the control
file.
- --check is an extra option able to verify checksums for a cluster.

When running --enable or --disable, the data folder gets fsync'd for
durability.  If no mode is specified in the options, then --check is
used for compatibility with older versions of pg_verify_checksums (now
renamed to pg_checksums in v12).

Author: Michael Banck
Reviewed-by: Fabien Coelho, Michael Paquier
Discussion: https://postgr.es/m/20181221201616.gd4...@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/pg_checksums.sgml|  50 +++-
 src/bin/pg_checksums/pg_checksums.c   | 173 ++
 src/bin/pg_checksums/t/002_actions.pl |  76 ---
 src/tools/pgindent/typedefs.list  |   1 +
 4 files changed, 254 insertions(+), 46 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 6a47dda683..32bdb0f5c2 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -16,7 +16,7 @@ PostgreSQL documentation
 
  
   pg_checksums
-  verify data checksums in a PostgreSQL database cluster
+  enable, disable or check data checksums in a PostgreSQL database cluster
  
 
  
@@ -36,10 +36,19 @@ PostgreSQL documentation
  
   Description
   
-   pg_checksums verifies data checksums in a
-   PostgreSQL cluster.  The server must be shut
-   down cleanly before running pg_checksums.
-   The exit status is zero if there are no checksum errors, otherwise nonzero.
+   pg_checksums checks, enables or disables data
+   checksums in a PostgreSQL cluster.  The server
+   must be shut down cleanly before running
+   pg_checksums. The exit status is zero if there
+   are no checksum errors when checking them, and nonzero if at least one
+   checksum failure is detected. If enabling or disabling checksums, the
+   exit status is nonzero if the operation failed.
+  
+
+  
+   While checking or enabling checksums needs to scan or write every file in
+   the cluster, disabling checksums will only update the file
+   pg_control.
   
  
 
@@ -60,6 +69,37 @@ PostgreSQL documentation
   
  
 
+ 
+  -c
+  --check
+  
+   
+Checks checksums. This is the default mode if nothing else is
+specified.
+   
+  
+ 
+
+ 
+  -d
+  --disable
+  
+   
+Disables checksums.
+   
+  
+ 
+
+ 
+  -e
+  --enable
+  
+   
+Enables checksums.
+   
+  
+ 
+
  
   -v
   --verbose
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index b7ebc11017..339f6ad7f5 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -1,7 +1,8 @@
 /*-
  *
  * pg_checksums.c
- *	  Verifies page level checksums in an offline cluster.
+ *	  Checks, enables or disables page level checksums for an offline
+ *	  cluster
  *
  * Copyright (c) 2010-2019, PostgreSQL Global Development Group
  *
@@ -17,14 +18,15 @@
 #include 
 #include 
 
-#include "catalog/pg_control.h"
+#include "access/xlog_internal.h"
 #include "common/controldata_utils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "getopt_long.h"
 #include "pg_getopt.h"
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/checksum_impl.h"
-#include "storage/fd.h"
 
 
 static int64 files = 0;
@@ -35,16 +37,38 @@ static ControlFileData *ControlFile;
 static char *only_relfilenode = NULL;
 static bool verbose = false;
 
+typedef enum
+{
+	PG_MODE_CHECK,
+	PG_MODE_DISABLE,
+	PG_MODE_ENABLE
+} PgChecksumMode;
+
+/*
+ * Filename components.
+ *
+ * XXX: fd.h is not declared here as frontend side code is not able to
+ * interact with the backend-side definitions for the various fsync
+ * wrappers.
+ */
+#define PG_TEMP_FILES_DIR "pgsql_tmp"
+#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
+
+static PgChecksumMode mode = PG_MODE_CHECK;
+
 static const char *progname;
 
 static void
 usage(void)
 {
-	printf(_("%s verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
+	

Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Michael Paquier
On Tue, Mar 19, 2019 at 09:47:17AM -0700, Andres Freund wrote:
> I'm not sure it needs to be this patch's responsibility to come up with
> a scheme here at all however. pg_rewind, pg_resetwal, pg_upgrade all
> don't really have a lockout mechanism, and it hasn't caused a ton of
> problems. I think it'd be good to invent something better, but it can't
> be some half assed approach that'll lead to people think their database
> is gone.

Amen.  Take it as you wish, but that's actually what I was mentioning
upthread one week ago where I argued that it is a problem, but not a
problem of this patch and that this problems concerns other tools:
https://www.postgresql.org/message-id/20190313093150.ge2...@paquier.xyz
And then, my position has been overthrown by anybody on this thread.
So I am happy to see somebody chiming in and say the same thing.

Honestly, I think that what I sent last week, with a patch in its
simplest form, would be enough:
https://www.postgresql.org/message-id/20190313021621.gp13...@paquier.xyz

In short, you keep the main feature with:
- No tweaks with postmaster.pid.
- Rely just on the control file indicating an instance shutdown
cleanly.
- No tweaks with the system ID.
- No renaming of the control file.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Andres Freund
Hi,

On 2019-03-19 17:30:16 +0100, Michael Banck wrote:
> Am Dienstag, den 19.03.2019, 09:13 -0700 schrieb Andres Freund:
> > On 2019-03-19 17:08:17 +0100, Michael Banck wrote:
> > > Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund:
> > > > On 2019-03-19 16:55:12 +0100, Michael Banck wrote:
> > > > > Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:
> > > > > > On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> > > > > > > +/*
> > > > > > > + * Locations of persistent and temporary control files.  The 
> > > > > > > control
> > > > > > > + * file gets renamed into a temporary location when enabling 
> > > > > > > checksums
> > > > > > > + * to prevent a parallel startup of Postgres.
> > > > > > > + */
> > > > > > > +#define CONTROL_FILE_PATH"global/pg_control"
> > > > > > > +#define CONTROL_FILE_PATH_TEMP   CONTROL_FILE_PATH 
> > > > > > > ".pg_checksums_in_progress"
> > > > > > 
> > > > > > I think this should be outright rejected. Again, you're making the
> > > > > > control file into something it isn't. And there's no buyin for this 
> > > > > > as
> > > > > > far as I can tell outside of Fabien and you. For crying out loud, 
> > > > > > if the
> > > > > > server crashes during this YOU'VE CORRUPTED THE CLUSTER.
> > > > > 
> > > > > The cluster is supposed to be offline during this.  This is just an
> > > > > additional precaution so that nobody starts it during the operation -
> > > > > similar to how pg_upgrade disables the old data directory.
> > > > 
> > > > I don't see how that matters. Afterwards the cluster needs low level
> > > > surgery to be recovered. That's a) undocumented b) likely to be done
> > > > wrongly.  This is completely unacceptable *AND UNNECESSARY*.
> > > 
> > > Can you explain why low level surgery is needed and how that would look
> > > like?
> > > 
> > > If pg_checksums successfully enables checksums, it will move back the
> > > control file and update the checksum version - the cluster is ready to
> > > be started again unless I am missing something?
> > > 
> > > If pg_checksums is interrupted by the admin, it will move back the
> > > control file and the cluster is ready to be started again as well.
> > > 
> > > If pg_checksums aborts with a failure, the admin will have to move back
> > > the control file before starting up the instance again, but I don't
> > > think that counts?
> > 
> > That absolutely counts. Even a short period would imo be unacceptable,
> > but this will take *hours* in many clusters. It's completely possible
> > that the machine crashes while the enabling is in progress.
> > 
> > And after restarting postgres or even pg_checksums, you'll just get a
> > message that there's no control file. How on earth is a normal user
> > supposed to recover from that?  Now, you could have a check for the
> > control file under the temporary name, and emit a hint about renaming,
> > but that has its own angers (like people renaming it just to start
> > postgres).
> 
> Ok, thanks for explaining. 
> 
> I guess if we check for the temporary name in postmaster during startup
> if pg_control isn't there then a more generally useful name like
> "pg_control.maintenance" should be picked. We could then spit out a nice
> error message or hint like "the cluster has been disabled for
> maintenance. In order to start it up anyway, rename
> pg_control.maintenance to pg_control" or so.

To be very clear: I am going to try to stop any patch with this
mechanism from going into the tree. I think it's an absurdly bad
idea. There'd need to be significant support from a number of other
committers to convince me otherwise.


> In any case, this would be more of a operational or availability issue
> and not a data-loss issue, as I feared from your previous mails.

It's just about undistinguishable for normal users.


> > And you're basically adding it because Fabien doesn't like
> > postmaster.pid and wants to invent another lockout mechanism in this
> > thread.
> 
> I think the hazard of another DBA (or some automated configuration
> management or HA tool for that matter) looking at postmaster.pid,
> deciding that it is not a legit file from a running instance, deleting
> it and then starting up Postgres while pg_checksums is still at work is
> worse than the above scenario, but maybe if we make the content of
> postmaster.pid clear enough (like "maintenance in progress"?) it would
> be enough of a hint?

Err, how would such a tool decide to do that safely? And if it did so,
how would it not cause problems in postgres' normal operation, given
that that postmaster.pid is crucial to prevent two postgres instances
running at the same time?


> Or do you have concrete suggestions on how this should work?

create a postmaster.pid with the pid of pg_checksums. That'll trigger a
postgres start from checking whether that pid is still alive. There'd
probably need to be a tiny change to CreateLockFile() to prevent it from
checking whether any shared 

Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Michael Banck
Hi,

Am Dienstag, den 19.03.2019, 09:13 -0700 schrieb Andres Freund:
> On 2019-03-19 17:08:17 +0100, Michael Banck wrote:
> > Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund:
> > > On 2019-03-19 16:55:12 +0100, Michael Banck wrote:
> > > > Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:
> > > > > On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> > > > > > +/*
> > > > > > + * Locations of persistent and temporary control files.  The 
> > > > > > control
> > > > > > + * file gets renamed into a temporary location when enabling 
> > > > > > checksums
> > > > > > + * to prevent a parallel startup of Postgres.
> > > > > > + */
> > > > > > +#define CONTROL_FILE_PATH  "global/pg_control"
> > > > > > +#define CONTROL_FILE_PATH_TEMP CONTROL_FILE_PATH 
> > > > > > ".pg_checksums_in_progress"
> > > > > 
> > > > > I think this should be outright rejected. Again, you're making the
> > > > > control file into something it isn't. And there's no buyin for this as
> > > > > far as I can tell outside of Fabien and you. For crying out loud, if 
> > > > > the
> > > > > server crashes during this YOU'VE CORRUPTED THE CLUSTER.
> > > > 
> > > > The cluster is supposed to be offline during this.  This is just an
> > > > additional precaution so that nobody starts it during the operation -
> > > > similar to how pg_upgrade disables the old data directory.
> > > 
> > > I don't see how that matters. Afterwards the cluster needs low level
> > > surgery to be recovered. That's a) undocumented b) likely to be done
> > > wrongly.  This is completely unacceptable *AND UNNECESSARY*.
> > 
> > Can you explain why low level surgery is needed and how that would look
> > like?
> > 
> > If pg_checksums successfully enables checksums, it will move back the
> > control file and update the checksum version - the cluster is ready to
> > be started again unless I am missing something?
> > 
> > If pg_checksums is interrupted by the admin, it will move back the
> > control file and the cluster is ready to be started again as well.
> > 
> > If pg_checksums aborts with a failure, the admin will have to move back
> > the control file before starting up the instance again, but I don't
> > think that counts?
> 
> That absolutely counts. Even a short period would imo be unacceptable,
> but this will take *hours* in many clusters. It's completely possible
> that the machine crashes while the enabling is in progress.
> 
> And after restarting postgres or even pg_checksums, you'll just get a
> message that there's no control file. How on earth is a normal user
> supposed to recover from that?  Now, you could have a check for the
> control file under the temporary name, and emit a hint about renaming,
> but that has its own angers (like people renaming it just to start
> postgres).

Ok, thanks for explaining. 

I guess if we check for the temporary name in postmaster during startup
if pg_control isn't there then a more generally useful name like
"pg_control.maintenance" should be picked. We could then spit out a nice
error message or hint like "the cluster has been disabled for
maintenance. In order to start it up anyway, rename
pg_control.maintenance to pg_control" or so.

In any case, this would be more of a operational or availability issue
and not a data-loss issue, as I feared from your previous mails.

> And you're basically adding it because Fabien doesn't like
> postmaster.pid and wants to invent another lockout mechanism in this
> thread.

I think the hazard of another DBA (or some automated configuration
management or HA tool for that matter) looking at postmaster.pid,
deciding that it is not a legit file from a running instance, deleting
it and then starting up Postgres while pg_checksums is still at work is
worse than the above scenario, but maybe if we make the content of
postmaster.pid clear enough (like "maintenance in progress"?) it would
be enough of a hint? Or do you have concrete suggestions on how this
should work?

I had the feeling (ab)using postmaster.pid for this would fly less than
using the same scheme as pg_upgrade does, but I'm fine doing it either
way.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Andres Freund
Hi,

On 2019-03-19 17:08:17 +0100, Michael Banck wrote:
> Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund:
> > On 2019-03-19 16:55:12 +0100, Michael Banck wrote:
> > > Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:
> > > > On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> > > > > +/*
> > > > > + * Locations of persistent and temporary control files.  The control
> > > > > + * file gets renamed into a temporary location when enabling 
> > > > > checksums
> > > > > + * to prevent a parallel startup of Postgres.
> > > > > + */
> > > > > +#define CONTROL_FILE_PATH"global/pg_control"
> > > > > +#define CONTROL_FILE_PATH_TEMP   CONTROL_FILE_PATH 
> > > > > ".pg_checksums_in_progress"
> > > > 
> > > > I think this should be outright rejected. Again, you're making the
> > > > control file into something it isn't. And there's no buyin for this as
> > > > far as I can tell outside of Fabien and you. For crying out loud, if the
> > > > server crashes during this YOU'VE CORRUPTED THE CLUSTER.
> > > 
> > > The cluster is supposed to be offline during this.  This is just an
> > > additional precaution so that nobody starts it during the operation -
> > > similar to how pg_upgrade disables the old data directory.
> > 
> > I don't see how that matters. Afterwards the cluster needs low level
> > surgery to be recovered. That's a) undocumented b) likely to be done
> > wrongly.  This is completely unacceptable *AND UNNECESSARY*.
> 
> Can you explain why low level surgery is needed and how that would look
> like?
> 
> If pg_checksums successfully enables checksums, it will move back the
> control file and update the checksum version - the cluster is ready to
> be started again unless I am missing something?
> 
> If pg_checksums is interrupted by the admin, it will move back the
> control file and the cluster is ready to be started again as well.
> 
> If pg_checksums aborts with a failure, the admin will have to move back
> the control file before starting up the instance again, but I don't
> think that counts?

That absolutely counts. Even a short period would imo be unacceptable,
but this will take *hours* in many clusters. It's completely possible
that the machine crashes while the enabling is in progress.

And after restarting postgres or even pg_checksums, you'll just get a
message that there's no control file. How on earth is a normal user
supposed to recover from that?  Now, you could have a check for the
control file under the temporary name, and emit a hint about renaming,
but that has its own angers (like people renaming it just to start
postgres).

And you're basically adding it because Fabien doesn't like
postmaster.pid and wants to invent another lockout mechanism in this
thread.

Greetings,

Andres Freund



Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Michael Banck
Hi,

Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund:
> On 2019-03-19 16:55:12 +0100, Michael Banck wrote:
> > Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:
> > > On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> > > > +/*
> > > > + * Locations of persistent and temporary control files.  The control
> > > > + * file gets renamed into a temporary location when enabling checksums
> > > > + * to prevent a parallel startup of Postgres.
> > > > + */
> > > > +#define CONTROL_FILE_PATH  "global/pg_control"
> > > > +#define CONTROL_FILE_PATH_TEMP CONTROL_FILE_PATH 
> > > > ".pg_checksums_in_progress"
> > > 
> > > I think this should be outright rejected. Again, you're making the
> > > control file into something it isn't. And there's no buyin for this as
> > > far as I can tell outside of Fabien and you. For crying out loud, if the
> > > server crashes during this YOU'VE CORRUPTED THE CLUSTER.
> > 
> > The cluster is supposed to be offline during this.  This is just an
> > additional precaution so that nobody starts it during the operation -
> > similar to how pg_upgrade disables the old data directory.
> 
> I don't see how that matters. Afterwards the cluster needs low level
> surgery to be recovered. That's a) undocumented b) likely to be done
> wrongly.  This is completely unacceptable *AND UNNECESSARY*.

Can you explain why low level surgery is needed and how that would look
like?

If pg_checksums successfully enables checksums, it will move back the
control file and update the checksum version - the cluster is ready to
be started again unless I am missing something?

If pg_checksums is interrupted by the admin, it will move back the
control file and the cluster is ready to be started again as well.

If pg_checksums aborts with a failure, the admin will have to move back
the control file before starting up the instance again, but I don't
think that counts?

If pg_checksums crashes due to I/O failures or other causes I can see
how possibly the block it was currently writing might need low level
surgery, but in that case we are in the domain of forensics already I
guess and that still does not pertain to the control file?

Sorry for being obtuse, I don't get it.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Andres Freund
On 2019-03-19 16:55:12 +0100, Michael Banck wrote:
> Hi,
> 
> Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:
> > On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> > > +/*
> > > + * Locations of persistent and temporary control files.  The control
> > > + * file gets renamed into a temporary location when enabling checksums
> > > + * to prevent a parallel startup of Postgres.
> > > + */
> > > +#define CONTROL_FILE_PATH"global/pg_control"
> > > +#define CONTROL_FILE_PATH_TEMP   CONTROL_FILE_PATH 
> > > ".pg_checksums_in_progress"
> > 
> > I think this should be outright rejected. Again, you're making the
> > control file into something it isn't. And there's no buyin for this as
> > far as I can tell outside of Fabien and you. For crying out loud, if the
> > server crashes during this YOU'VE CORRUPTED THE CLUSTER.
> 
> The cluster is supposed to be offline during this.  This is just an
> additional precaution so that nobody starts it during the operation -
> similar to how pg_upgrade disables the old data directory.

I don't see how that matters. Afterwards the cluster needs low level
surgery to be recovered. That's a) undocumented b) likely to be done
wrongly.  This is completely unacceptable *AND UNNECESSARY*.

Greetings,

Andres Freund



Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Michael Banck
Hi,

Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:
> On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> > +/*
> > + * Locations of persistent and temporary control files.  The control
> > + * file gets renamed into a temporary location when enabling checksums
> > + * to prevent a parallel startup of Postgres.
> > + */
> > +#define CONTROL_FILE_PATH  "global/pg_control"
> > +#define CONTROL_FILE_PATH_TEMP CONTROL_FILE_PATH 
> > ".pg_checksums_in_progress"
> 
> I think this should be outright rejected. Again, you're making the
> control file into something it isn't. And there's no buyin for this as
> far as I can tell outside of Fabien and you. For crying out loud, if the
> server crashes during this YOU'VE CORRUPTED THE CLUSTER.

The cluster is supposed to be offline during this.  This is just an
additional precaution so that nobody starts it during the operation -
similar to how pg_upgrade disables the old data directory.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Andres Freund
Hi,

On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> +/*
> + * Locations of persistent and temporary control files.  The control
> + * file gets renamed into a temporary location when enabling checksums
> + * to prevent a parallel startup of Postgres.
> + */
> +#define CONTROL_FILE_PATH"global/pg_control"
> +#define CONTROL_FILE_PATH_TEMP   CONTROL_FILE_PATH 
> ".pg_checksums_in_progress"

I think this should be outright rejected. Again, you're making the
control file into something it isn't. And there's no buyin for this as
far as I can tell outside of Fabien and you. For crying out loud, if the
server crashes during this YOU'VE CORRUPTED THE CLUSTER.

- Andres



Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Fabien COELHO




Ok, this might not work, because of the following, less likely, race
condition:

 postmaster opens control file RW
 pg_checksums moves control file, postmater open file handle follows
 ...

So ISTM that we really need some locking to have something clean.


We are talking about complicating a method which is already fine for a
a window where the whole operation works, as it could take hours to
enable checksums, versus a couple of instructions.  I am not sure that
it is worth complicating the code more.


Hmmm. Possibly. The point is that anything only needs to be implemented 
once. The whole point of pg is to have ACID transactional properties, but 
it does not achieve that on the controlfile, which I find paradoxical:-)


Now if there is a race condition opportunity, ISTM that it should be as 
short as possible. Renaming before manipulating seems safer if other 
commands proceeds like that as well. Probably if pg always rename *THEN* 
open before doing anything in all commands it could be safe, provided that 
the renaming is atomic, which I think is the case.


That would avoid locking, at the price of a small probability of having a 
controlfile in a controlfile.command-that-failed-at-the-wrong-time state. 
Maybe it is okay. Maybe there is a need to be able to force the 
state back to something to recover from such unlikely event, but probably 
it does already exists (eg postmaster could be dead without releasing the 
controlfile state).


--
Fabien.



Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Michael Banck
Hi,

Am Dienstag, den 19.03.2019, 11:48 +0100 schrieb Fabien COELHO:
> Moving the controlfile looks like an effective way to prevent any 
> concurrent start, as the fs operation is probably atomic and especially if 
> external tools uses the same trick. However this is not the case yet, eg 
> "pg_resetwal" uses a "postmaster.pid" hack instead. Probably the method 
> could be unified, possibly with some functions in "controlfile_utils.c".
> 
> However, I think that there still is a race condition because of the order 
> in which it is implemented:
> 
>   pg_checksums reads control file
>   pg_checksums checks control file contents...
>   ** cluster may be started and the control file updated
>   pg_checksums moves the (updated) control file
>   pg_checksums proceeds on a running cluster
>   pg_checksums moves back the control file
>   pg_checksums updates the control file contents, overriding updates
> 
> I think that the correct way to handle this for enable/disable is:
> 
>   pg_checksums moves the control file
>   pg_checksums reads, checks, proceeds, updates
>   pg_checksums moves back the control file
> 
> This probably means extending a little bit the update_controlfile function 
> to allow a suffix. No big deal.
> 
> Ok, this might not work, because of the following, less likely, race 
> condition:
> 
>   postmaster opens control file RW
>   pg_checksums moves control file, posmater open file handle follows
>   ...
> 
> So ISTM that we really need some locking to have something clean.

I think starting the postmaster during offline maintenance is already
quite some pilot error. As pg_checksums can potentially run for hours
though, I agree it is important to disable the cluster in the meantime.

There's really not a lot going on between pg_checksums reading the
control file and moving it away - what you propose above sounds a bit
like overengineering to me.

If anything, we could include the postmaster.pid check from pg_resetwal
after we have renamed the control file to make absolutely sure that the
cluster is offline. Once the control file is gone and there is no
postmaster.pid, it surely cannot be pg_checksums' problem anymore if a
postmaster is started regardless of maintenance.

I leave that to Michael to decide whether he thinks the above is
warranted.

I think the more important open issue is what to do about PITR and
streaming replication, see my replies to Magnus elsewhere in the thread.

> Why not always use "pgrename" instead of the strange pg_mv_file macro?

pg_ugprade does it the same way, possibly both could be converted to
pgrename, dunno.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Michael Paquier
On Tue, Mar 19, 2019 at 11:48:25AM +0100, Fabien COELHO wrote:
> Moving the controlfile looks like an effective way to prevent any concurrent
> start, as the fs operation is probably atomic and especially if external
> tools uses the same trick. However this is not the case yet, eg
> "pg_resetwal" uses a "postmaster.pid" hack instead.

pg_upgrade does so.  Note that pg_resetwal does not check either that
the PID in the file is actually running.

> Probably the method could be unified, possibly with some functions
> in "controlfile_utils.c".

Hm.  postmaster.pid is just here to make sure that the instance is not
started at all, while we require the instance to be stopped cleanly
with other tools, so that's not really consistent in my opinion to
combine both.

> Ok, this might not work, because of the following, less likely, race
> condition:
> 
>  postmaster opens control file RW
>  pg_checksums moves control file, postmater open file handle follows
>  ...
> 
> So ISTM that we really need some locking to have something clean.

We are talking about complicating a method which is already fine for a
a window where the whole operation works, as it could take hours to
enable checksums, versus a couple of instructions.  I am not sure that
it is worth complicating the code more.

> Help line about --check not simplified as suggested in a prior review,
> although you said you would take it into account.

Oops, it looks like this got lost because of the successive rebases.
I am sure to have updated it at some point..  Anyway, thanks for
pointing it out, I got that fixed on my local branch.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Fabien COELHO


Bonjour Michaël,


Please find attached an updated patch set, I have rebased that stuff
on top of my recent commits to refactor the control file updates.


Patch applies cleanly, compiles, make check-world seems ok, doc build ok.

It would help if the patch includes a version number. I assume that this 
is v7.


Doc looks ok.

Moving the controlfile looks like an effective way to prevent any 
concurrent start, as the fs operation is probably atomic and especially if 
external tools uses the same trick. However this is not the case yet, eg 
"pg_resetwal" uses a "postmaster.pid" hack instead. Probably the method 
could be unified, possibly with some functions in "controlfile_utils.c".


However, I think that there still is a race condition because of the order 
in which it is implemented:


 pg_checksums reads control file
 pg_checksums checks control file contents...
 ** cluster may be started and the control file updated
 pg_checksums moves the (updated) control file
 pg_checksums proceeds on a running cluster
 pg_checksums moves back the control file
 pg_checksums updates the control file contents, overriding updates

I think that the correct way to handle this for enable/disable is:

 pg_checksums moves the control file
 pg_checksums reads, checks, proceeds, updates
 pg_checksums moves back the control file

This probably means extending a little bit the update_controlfile function 
to allow a suffix. No big deal.


Ok, this might not work, because of the following, less likely, race 
condition:


 postmaster opens control file RW
 pg_checksums moves control file, posmater open file handle follows
 ...

So ISTM that we really need some locking to have something clean.

Why not always use "pgrename" instead of the strange pg_mv_file macro?

Help line about --check not simplified as suggested in a prior review, 
although you said you would take it into account.


Tests look ok.

--
Fabien.

Re: Offline enabling/disabling of data checksums

2019-03-18 Thread Michael Paquier
On Fri, Mar 15, 2019 at 01:37:27PM +0100, Michael Banck wrote:
> Am Freitag, den 15.03.2019, 21:23 +0900 schrieb Michael Paquier:
>> Perhaps having them under --verbose makes more sense?
> 
> Well if we think it is essential in order to tell the user what happened
> in the case of an error, it shouldn't be verbose I guess.

I would still keep them to be honest.  I don't know, if others find
the tool too chatty we could always rework that part and tune it.

Please find attached an updated patch set, I have rebased that stuff
on top of my recent commits to refactor the control file updates.
While reviewing, I have found a problem in the docs (forgot a 
markup previously), and there was a problem with the parent path fsync
causing an interruption to not return the correct error code, and
actually we should just use durable_rename() in this case (if
--no-sync gets in then pg_mv_file() should be used of course).

I have also been thinking about what we could add in the
documentation, so this version adds a draft to describe the cases
where enabling checksums can lead to corruption when involving
multiple nodes in a cluster and tools doing physical copy of relation
blocks.

I have not done the --no-sync part yet on purpose, as that will most
likely conflict based on the feedback received for this version..
--
Michael
From a85112d87ec4bc4b00d22c105b9958a2c70c3758 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 18 Mar 2019 17:12:15 +0900
Subject: [PATCH] Add options to enable and disable checksums in pg_checksums

An offline cluster can now work with more modes in pg_checksums:
- --enable can enable checksums in a cluster, updating all blocks with a
correct checksum, and update the control file at the end.
- --disable can disable checksums in a cluster, updating the the control
file.
- --check is an extra option able to verify checksums for a cluster.

When using --disable, only the control file is updated and then
flushed.  When using --enable, the process gets more complicated as the
operation can be long:
- Rename the control file to a temporary name, to prevent a parallel
startup of Postgres.
- Scan all files and update their checksums.
- Rename back the control file.
- Flush the data directory.
- Update the control file and then flush it, to make the change
durable.
If the operation is interrupted, the control file gets moved back in
place.

If no mode is specified in the options, then --check is used for
compatibility with older versions of pg_verify_checksums (now renamed to
pg_checksums in v12).

Author: Michael Banck, Michael Paquier
Reviewed-by: Fabien Coelho, Magnus Hagander, Sergei Kornilov
Discussion: https://postgr.es/m/20181221201616.gd4...@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/pg_checksums.sgml|  72 ++-
 src/bin/pg_checksums/pg_checksums.c   | 280 +++---
 src/bin/pg_checksums/t/002_actions.pl |  76 +--
 src/tools/pgindent/typedefs.list  |   1 +
 4 files changed, 381 insertions(+), 48 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 6a47dda683..a7f4ef1024 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -16,7 +16,7 @@ PostgreSQL documentation
 
  
   pg_checksums
-  verify data checksums in a PostgreSQL database cluster
+  enable, disable or check data checksums in a PostgreSQL database cluster
  
 
  
@@ -36,10 +36,24 @@ PostgreSQL documentation
  
   Description
   
-   pg_checksums verifies data checksums in a
-   PostgreSQL cluster.  The server must be shut
-   down cleanly before running pg_checksums.
-   The exit status is zero if there are no checksum errors, otherwise nonzero.
+   pg_checksums checks, enables or disables data
+   checksums in a PostgreSQL cluster.  The server
+   must be shut down cleanly before running
+   pg_checksums. The exit status is zero if there
+   are no checksum errors when checking them, and nonzero if at least one
+   checksum failure is detected. If enabling or disabling checksums, the
+   exit status is nonzero if the operation failed.
+  
+
+  
+   Checking checksums requires to scan every file holding them in the data
+   folder.  Disabling checksums requires only an update of the file
+   pg_control.  Enabling checksums first renames
+   the file pg_control to
+   pg_control.pg_checksums_in_progress to prevent
+   a parallel startup of the cluster, then it updates all files with
+   checksums, and it finishes by renaming and updating
+   pg_control to mark checksums as enabled.
   
  
 
@@ -60,6 +74,37 @@ PostgreSQL documentation
   
  
 
+ 
+  -c
+  --check
+  
+   
+Checks checksums. This is the default mode if nothing else is
+specified.
+   
+  
+ 
+
+ 
+  -d
+  --disable
+  
+   
+Disables checksums.
+   
+  
+ 
+
+ 
+  -e
+  --enable
+  
+   
+Enables checksums.
+   

Re: Offline enabling/disabling of data checksums

2019-03-18 Thread Fabien COELHO


Bonjour Michaël,


Here are my notes about the fixes:


Thanks for the fixes.


- pg_resetwal got broken because the path to the control file was
incorrect.  Running tests of pg_upgrade or the TAP tests of
pg_resetwal showed the failure.


Hmmm… I thought I had done that with "make check-world":-(


- pg_rewind was issuing a flush of the control file even if --no-sync
was used.


Indeed, I missed this one.


- Nit: incorrect header order in controldata_utils.c.  I have kept the
backend-only includes grouped though.


I'll pay attention to that the next time.

Thanks for the push.

--
Fabien.

Re: Offline enabling/disabling of data checksums

2019-03-17 Thread Michael Paquier
On Sun, Mar 17, 2019 at 12:44:39PM +0100, Fabien COELHO wrote:
> I kept the initial no-parameter function which calls the new one with 4
> parameters, though, because it looks more homogeneous this way in the
> backend code. This is debatable.

From a compatibility point of view, your position actually makes
sense, at least to me and after sleeping on it as UpdateControlFile is
not static, and that there are interactions with the other local
routines to read and write the control file because of the variable
ControlFile at the top of xlog.c.  So I have kept the original
interface, being now only a wrapper of the new routine.

> Attached is an update.

Thanks, I have committed the patch after fixing a couple of things.
After considering the interface, I have switched to a single boolean
as I could not actually imagine with what kind of fancy features this
could be extended further more.  If I am wrong, let's adjust it later
on.  Here are my notes about the fixes:
- pg_resetwal got broken because the path to the control file was
incorrect.  Running tests of pg_upgrade or the TAP tests of
pg_resetwal showed the failure.
- The previously-mentioned problem with close() in the new routine is
fixed.
- Header comments at the top of update_controlfile were a bit messed
up (s/Not/Note/, missed an "up" as well).
- pg_rewind was issuing a flush of the control file even if --no-sync
was used.
- Nit: incorrect header order in controldata_utils.c.  I have kept the
backend-only includes grouped though.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-17 Thread Fabien COELHO



You have one error at the end of update_controlfile(), where close() 
could issue a frontend-like error for the backend, calling exit() on the 
way.  That's not good. (No need to send a new patch, I'll fix it 
myself.)


Indeed. I meant to merge the "if (close(fd))", but ended merging the error 
generation as well.


--
Fabien



Re: Offline enabling/disabling of data checksums

2019-03-17 Thread Michael Paquier
On Sun, Mar 17, 2019 at 12:44:39PM +0100, Fabien COELHO wrote:
> I could remove the two "catalog/" includes from pg_resetwal, I assume that
> you meant these ones.

Not exactly.  What I meant is that if you try to call directly
fsync_fname and fsync_parent_path from file_utils.h, then you get into
trouble because of xlog.h..  Sure you can remove also the ones you
removed.

> Hmmm. I just did that, but what about just a boolean? What other options
> could be required? Maybe some locking/checking?

It is already expected from the caller to properly take
ControlFileLock.  Note I tend to worry too much about the
extensibility of published APIs these days as well, so perhaps just a
boolean would be fine, please let me reconsider that after some sleep,
and it is not like the contents of this routine are going to become
much complicated either, except potentially to control the flags on
open(). :p

> I kept the initial no-parameter function which calls the new one with 4
> parameters, though, because it looks more homogeneous this way in the
> backend code. This is debatable.

True, this actually makes back-patching a bit easier, and there are 13
calls of UpdateControlFile().

> Attached is an update.

Thanks, I'll take a look at that tomorrow.  You have one error at the
end of update_controlfile(), where close() could issue a frontend-like
error for the backend, calling exit() on the way.  That's not good.
(No need to send a new patch, I'll fix it myself.)
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-17 Thread Fabien COELHO


Michaël-san,


The issue here is that trying to embed directly the fsync routines
from file_utils.c into pg_resetwal.c messes up the inclusions because
pg_resetwal.c includes backend-side includes, which themselves touch
fd.h :(

In short your approach avoids some extra mess with the include
dependencies. .


I could remove the two "catalog/" includes from pg_resetwal, I assume that 
you meant these ones.



Maybe the two changes could be committed separately.


I was thinking about this one, and for pg_rewind we don't care about
the fsync of the control file because the full data folder gets
fsync'd afterwards and in the event of a crash in the middle of a
rewind the target data folder is surely not something to use, but we
do for pg_checksums, and we do for pg_resetwal.  Even if there is the
argument that usually callers of update_controlfile() would care a
lot about the control file and fsync it, I think that we need some
control on if we do the fsync or not because many tools have a
--no-sync and that should be fully respected.


So while your patch is on a good track, I would suggest to do the 
following things to complete it:


- Add an extra argument bits16 to update_controlfile to pass a set of 
optional flags, with NOSYNC being the only and current value.  The 
default is to flush the file.


Hmmm. I just did that, but what about just a boolean? What other options 
could be required? Maybe some locking/checking?



- Move the wait event calls WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE and
WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE to controldata_utils.c.


Done.


- And then delete UpdateControlFile() in xlog.c, and use
update_controlfile() instead to remove even more code.


I was keeping that one for another patch because it touches the backend 
code, but it makes sense to do that in one go for consistency.


I kept the initial no-parameter function which calls the new one with 4 
parameters, though, because it looks more homogeneous this way in the 
backend code. This is debatable.


The version in xlog.c uses BasicOpenFile(), so we should use also that 
in update_controlfile() instead of OpenTransientFile(). As any errors 
result in a PANIC we don't care about leaking fds.


Done.

Attached is an update.

--
Fabien.diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 54d3c558c6..7f782e255c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "common/controldata_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -4757,48 +4758,7 @@ ReadControlFile(void)
 void
 UpdateControlFile(void)
 {
-	int			fd;
-
-	INIT_CRC32C(ControlFile->crc);
-	COMP_CRC32C(ControlFile->crc,
-(char *) ControlFile,
-offsetof(ControlFileData, crc));
-	FIN_CRC32C(ControlFile->crc);
-
-	fd = BasicOpenFile(XLOG_CONTROL_FILE,
-	   O_RDWR | PG_BINARY);
-	if (fd < 0)
-		ereport(PANIC,
-(errcode_for_file_access(),
- errmsg("could not open file \"%s\": %m", XLOG_CONTROL_FILE)));
-
-	errno = 0;
-	pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE);
-	if (write(fd, ControlFile, sizeof(ControlFileData)) != sizeof(ControlFileData))
-	{
-		/* if write didn't set errno, assume problem is no disk space */
-		if (errno == 0)
-			errno = ENOSPC;
-		ereport(PANIC,
-(errcode_for_file_access(),
- errmsg("could not write to file \"%s\": %m",
-		XLOG_CONTROL_FILE)));
-	}
-	pgstat_report_wait_end();
-
-	pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE);
-	if (pg_fsync(fd) != 0)
-		ereport(PANIC,
-(errcode_for_file_access(),
- errmsg("could not fsync file \"%s\": %m",
-		XLOG_CONTROL_FILE)));
-	pgstat_report_wait_end();
-
-	if (close(fd))
-		ereport(PANIC,
-(errcode_for_file_access(),
- errmsg("could not close file \"%s\": %m",
-		XLOG_CONTROL_FILE)));
+	update_controlfile(".", XLOG_CONTROL_FILE, ControlFile, CF_OPT_NONE);
 }
 
 /*
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 2af8713216..334903711e 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -49,8 +49,7 @@
 #include "access/multixact.h"
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
-#include "catalog/catversion.h"
-#include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
 #include "common/fe_memutils.h"
 #include "common/file_perm.h"
 #include "common/restricted_token.h"
@@ -918,18 +917,6 @@ PrintNewControlValues(void)
 static void
 RewriteControlFile(void)
 {
-	int			fd;
-	char		buffer[PG_CONTROL_FILE_SIZE];	/* need not be aligned */
-
-	/*
-	 * For good luck, apply the same static assertions as in backend's
-	 * WriteControlFile().
-	 */
-	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_MAX_SAFE_SIZE,
-	 "pg_control is too large for atomic disk writes");
-	

Re: Offline enabling/disabling of data checksums

2019-03-17 Thread Michael Paquier
On Sun, Mar 17, 2019 at 10:01:20AM +0100, Fabien COELHO wrote:
> The implementation basically removes a lot of copy paste and calls the new
> update_controlfile function instead. I like removing useless code:-)

Yes, I spent something like 10 minutes looking at that code yesterday
and I agree that removing the control file to recreate it is not
really necessary, even if the window between its removal and
recreation is short.

> I do not see the value of *not* fsyncing the control file when writing it,
> as it is by definition very precious, so I added a fsync. The server side
> branch uses the backend available "pg_fsync", which complies with server
> settings there and can do nothing if fsync is disabled.

The issue here is that trying to embed directly the fsync routines
from file_utils.c into pg_resetwal.c messes up the inclusions because
pg_resetwal.c includes backend-side includes, which themselves touch
fd.h :(

In short your approach avoids some extra mess with the include
dependencies. .

> Maybe the two changes could be committed separately.

I was thinking about this one, and for pg_rewind we don't care about
the fsync of the control file because the full data folder gets
fsync'd afterwards and in the event of a crash in the middle of a
rewind the target data folder is surely not something to use, but we
do for pg_checksums, and we do for pg_resetwal.  Even if there is the
argument that usually callers of update_controlfile() would care a
lot about the control file and fsync it, I think that we need some
control on if we do the fsync or not because many tools have a
--no-sync and that should be fully respected.  So while your patch is
on a good track, I would suggest to do the following things to
complete it:
- Add an extra argument bits16 to update_controlfile to pass a set of
optional flags, with NOSYNC being the only and current value.  The
default is to flush the file.
- Move the wait event calls WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE and
WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE to controldata_utils.c.
- And then delete UpdateControlFile() in xlog.c, and use
update_controlfile() instead to remove even more code.  The version in
xlog.c uses BasicOpenFile(), so we should use also that in
update_controlfile() instead of OpenTransientFile().  As any errors
result in a PANIC we don't care about leaking fds.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-17 Thread Fabien COELHO


Bonjour Michaël-san,


Yes, that would be nice, for now I have focused.  For pg_resetwal yes
we could do it easily.  Would you like to send a patch?


Here is a proposal for "pg_resetwal".

The implementation basically removes a lot of copy paste and calls the 
new update_controlfile function instead. I like removing useless code:-)


The reserwal implementation was doing a rm/create cycle, which was leaving 
a small window for losing the controlfile. Not neat.


I do not see the value of *not* fsyncing the control file when writing it, 
as it is by definition very precious, so I added a fsync. The server side 
branch uses the backend available "pg_fsync", which complies with server 
settings there and can do nothing if fsync is disabled.


Maybe the two changes could be committed separately.

--
Fabien.diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 2af8713216..dd085e16ab 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -918,18 +918,6 @@ PrintNewControlValues(void)
 static void
 RewriteControlFile(void)
 {
-	int			fd;
-	char		buffer[PG_CONTROL_FILE_SIZE];	/* need not be aligned */
-
-	/*
-	 * For good luck, apply the same static assertions as in backend's
-	 * WriteControlFile().
-	 */
-	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_MAX_SAFE_SIZE,
-	 "pg_control is too large for atomic disk writes");
-	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_FILE_SIZE,
-	 "sizeof(ControlFileData) exceeds PG_CONTROL_FILE_SIZE");
-
 	/*
 	 * Adjust fields as needed to force an empty XLOG starting at
 	 * newXlogSegNo.
@@ -961,53 +949,7 @@ RewriteControlFile(void)
 	ControlFile.max_prepared_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
-	/* Contents are protected with a CRC */
-	INIT_CRC32C(ControlFile.crc);
-	COMP_CRC32C(ControlFile.crc,
-(char *) ,
-offsetof(ControlFileData, crc));
-	FIN_CRC32C(ControlFile.crc);
-
-	/*
-	 * We write out PG_CONTROL_FILE_SIZE bytes into pg_control, zero-padding
-	 * the excess over sizeof(ControlFileData).  This reduces the odds of
-	 * premature-EOF errors when reading pg_control.  We'll still fail when we
-	 * check the contents of the file, but hopefully with a more specific
-	 * error than "couldn't read pg_control".
-	 */
-	memset(buffer, 0, PG_CONTROL_FILE_SIZE);
-	memcpy(buffer, , sizeof(ControlFileData));
-
-	unlink(XLOG_CONTROL_FILE);
-
-	fd = open(XLOG_CONTROL_FILE,
-			  O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
-			  pg_file_create_mode);
-	if (fd < 0)
-	{
-		fprintf(stderr, _("%s: could not create pg_control file: %s\n"),
-progname, strerror(errno));
-		exit(1);
-	}
-
-	errno = 0;
-	if (write(fd, buffer, PG_CONTROL_FILE_SIZE) != PG_CONTROL_FILE_SIZE)
-	{
-		/* if write didn't set errno, assume problem is no disk space */
-		if (errno == 0)
-			errno = ENOSPC;
-		fprintf(stderr, _("%s: could not write pg_control file: %s\n"),
-progname, strerror(errno));
-		exit(1);
-	}
-
-	if (fsync(fd) != 0)
-	{
-		fprintf(stderr, _("%s: fsync error: %s\n"), progname, strerror(errno));
-		exit(1);
-	}
-
-	close(fd);
+	update_controlfile(XLOG_CONTROL_FILE, progname, );
 }
 
 
diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index 71e67a2eda..78ce8b020f 100644
--- a/src/common/controldata_utils.c
+++ b/src/common/controldata_utils.c
@@ -144,9 +144,9 @@ get_controlfile(const char *DataDir, const char *progname, bool *crc_ok_p)
  * update_controlfile()
  *
  * Update controlfile values with the contents given by caller.  The
- * contents to write are included in "ControlFile".  Note that it is up
- * to the caller to fsync the updated file, and to properly lock
- * ControlFileLock when calling this routine in the backend.
+ * contents to write are included in "ControlFile". Not that it is
+ * to the caller to properly lock ControlFileLock when calling this
+ * routine in the backend.
  */
 void
 update_controlfile(const char *DataDir, const char *progname,
@@ -216,6 +216,21 @@ update_controlfile(const char *DataDir, const char *progname,
 #endif
 	}
 
+#ifndef FRONTEND
+	if (pg_fsync(fd) != 0)
+		ereport(PANIC,
+(errcode_for_file_access(),
+ errmsg("could not fsync file \"%s\": %m",
+		ControlFilePath)));
+#else
+	if (fsync(fd) != 0)
+	{
+		fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"),
+progname, ControlFilePath, strerror(errno));
+		exit(EXIT_FAILURE);
+	}
+#endif
+
 #ifndef FRONTEND
 	if (CloseTransientFile(fd))
 		ereport(PANIC,


Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Banck
Hi,

Am Freitag, den 15.03.2019, 21:23 +0900 schrieb Michael Paquier:
> On Fri, Mar 15, 2019 at 12:54:01PM +0100, Michael Banck wrote:
> > 1. There's a typo in line 578 which makes it fail to compile:
> > 
> > > src/bin/pg_checksums/pg_checksums.c:578:4: error: ‘y’ undeclared (first 
> > > use in this function)
> > >   }y
> 
> I am wondering where you got this one.  My local branch does not have
> it, and the patch I sent does not seem to have it either.

Mea culpa, I must've fat fingered something in the editor before
applying your patch, sorry. I should've double-checked.

> > 2. Should the pqsignal() stuff only be setup in PG_MODE_ENABLE? Same
> > with the controlfile_path?
> 
> PG_MODE_DISABLE needs controlfile_path as well.  We could make the
> cleanup only available when using --enable, the code just looked more
> simple in its current shape.  I think it's just more simple to set
> everything unconditionally.  This code may become more complicated in
> the future.

Ok.

> > 3. There's (I think) leftover debug output in the following places:
> > 
> > > + printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path,
> > > + controlfile_path_temp);
> > > + printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path_temp,
> > > + controlfile_path);
> > > + printf(_("Syncing data folder\n"));
> > 
> > (that one is debatable, we are mentioning this only in verbose mode in
> > pg_basebackup but pg_checksums is more chatty anyway, so probably
> > fine).
> 
> This is wanted.  Many folks have been complaning on this thread about
> crashes and such, surely we want logs about what happens :)
> 
> > > + printf(_("Updating control file\n"));
> > 
> > Besides to the syncing message (which is user-relevant cause they might
> > wonder what is taking so long), the others seem to be implementation
> > details we don't need to tell the user about.
> 
> Perhaps having them under --verbose makes more sense?

Well if we think it is essential in order to tell the user what happened
in the case of an error, it shouldn't be verbose I guess.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Paquier
On Fri, Mar 15, 2019 at 12:54:01PM +0100, Michael Banck wrote:
> 1. There's a typo in line 578 which makes it fail to compile:
> 
> |src/bin/pg_checksums/pg_checksums.c:578:4: error: ‘y’ undeclared (first use 
> in this function)
> |   }y

I am wondering where you got this one.  My local branch does not have
it, and the patch I sent does not seem to have it either.

> 2. Should the pqsignal() stuff only be setup in PG_MODE_ENABLE? Same
> with the controlfile_path?

PG_MODE_DISABLE needs controlfile_path as well.  We could make the
cleanup only available when using --enable, the code just looked more
simple in its current shape.  I think it's just more simple to set
everything unconditionally.  This code may become more complicated in
the future.

> 3. There's (I think) leftover debug output in the following places:
> 
> |+printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path,
> |+controlfile_path_temp);
> 
> |+printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path_temp,
> |+controlfile_path);
> 
> |+printf(_("Syncing data folder\n"));
> 
> (that one is debatable, we are mentioning this only in verbose mode in
> pg_basebackup but pg_checksums is more chatty anyway, so probably
> fine).

This is wanted.  Many folks have been complaning on this thread about
crashes and such, surely we want logs about what happens :)

> |+printf(_("Updating control file\n"));
> 
> Besides to the syncing message (which is user-relevant cause they might
> wonder what is taking so long), the others seem to be implementation
> details we don't need to tell the user about.

Perhaps having them under --verbose makes more sense?
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Banck
Hi,

Am Freitag, den 15.03.2019, 11:50 +0900 schrieb Michael Paquier:
> I have been able to grab some time to incorporate the feedback gathered
> on this thread, and please find attached a new version of the patch to
> add --enable/--disable. 

Some more feedback:

1. There's a typo in line 578 which makes it fail to compile:

|src/bin/pg_checksums/pg_checksums.c:578:4: error: ‘y’ undeclared (first use in 
this function)
|   }y

2. Should the pqsignal() stuff only be setup in PG_MODE_ENABLE? Same
with the controlfile_path?

3. There's (I think) leftover debug output in the following places:

|+  printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path,
|+  controlfile_path_temp);

|+  printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path_temp,
|+  controlfile_path);

|+  printf(_("Syncing data folder\n"));

(that one is debatable, we are mentioning this only in verbose mode in
pg_basebackup but pg_checksums is more chatty anyway, so probably fine).

|+  printf(_("Updating control file\n"));

Besides to the syncing message (which is user-relevant cause they might
wonder what is taking so long), the others seem to be implementation
details we don't need to tell the user about.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Banck
Hi,

Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander:
> On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov  wrote:
> > One new question from me: how about replication?
> > Case: primary+replica, we shut down primary and enable checksum, and
> > "started streaming WAL from primary" without any issue. I have
> > master with checksums, but replica without.
> > Or cluster with checksums, then disable checksums on primary, but
> > standby think we have checksums.
> 
> Enabling or disabling the checksums offline on the master quite
> clearly requires a rebuild of the standby, there is no other way (this
> is one of the reasons for the online enabling in that patch, so I
> still hope we can get that done -- but not for this version).
> 
> You would have the same with PITR backups for example. 

I'd like to get back to PITR. 

I thought about this a bit and actually I think PITR might be fine in
the sense that if you enabled or disabled the cluster after the last
basebackup and then do PITR with the avaiable WAL beyond that, you would
get a working cluster, just with the checksum state the cluster had at
the time of the basebackup. I think that would be entirely accetable, so
long as nothing else breaks?

I made some quick tests and did see no errors, but maybe I am missing
something?
 
> And especially if you have some tool that does block or segment level
> differential.

This might be the case, but not sure if PostgreSQL core must worry about
it? Obviously the documentation must be made explicit about these kinds
of cases.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Paquier
On Fri, Mar 15, 2019 at 09:52:11AM +0100, Magnus Hagander wrote:
> As I said, that's a big hammer. I'm all for having a better solution. But I
> don't think it's acceptable not to have *any* defense against it, given how
> bad corruption it can lead to.

Hm...  It looks that my arguments are not convincing enough.  I am not
really convinced that there is any need to make that the default, nor
does it make much sense to embed that stuff directly into pg_checksums
because that's actually just doing an extra step which is equivalent
to calling pg_resetwal, and we know that this tool has the awesome
reputation to cause more harm than anything else.  At least I would
like to have an option which allows to support the behavior to *not*
update the system identifier so as the cases I mentioned would be
supported, because then it becomes possible to enable checksums on a
primary with only a failover as long as page copies are not directly
involved and that all operations go through WAL.  And that would be
quite nice.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Magnus Hagander
On Thu, Mar 14, 2019 at 4:26 PM Michael Banck 
wrote:

> Hi,
>
> Am Donnerstag, den 14.03.2019, 15:26 +0100 schrieb Magnus Hagander:
> > Given that the failure is data corruption, I don't think big fat
> > warning is enough. We should really make it impossible to start up the
> > postmaster by mistake during the checksum generation. People don't
> > read the documentation until it's too late. And it might not even be
> > under their control - some automated tool might go in and try to start
> > postgres, and boom, corruption.
>
> I guess you're right.
>
> > One big-hammer method could be similar to what pg_upgrade does --
> > temporarily rename away the controlfile so postgresql can't start, and
> > when done, put it back.
>
> That sounds like a good solution to me. I've made PoC patch for that,
> see attached.
>

The downside with this method is we can't get a nice error message during
the attempted startup. But it should at least be safe, which is the most
important part. And at least it's clear what's happening once you list the
files and see the name of the temporary one.

//Magnus


Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Magnus Hagander
On Thu, Mar 14, 2019 at 4:54 PM Michael Banck 
wrote:

> Hi,
>
> Am Donnerstag, den 14.03.2019, 15:32 +0100 schrieb Magnus Hagander:
> > On Thu, Mar 14, 2019 at 3:28 PM Christoph Berg  wrote:
> > > Re: Magnus Hagander 2019-03-14  zmb8qck7ndmchey5...@mail.gmail.com>
> > > > Are you suggesting we should support running with a master with
> checksums
> > > > on and a standby with checksums off in the same cluster? That
> seems.. Very
> > > > fragile.
> > >
> > > The case "shut down master and standby, run pg_checksums on both, and
> > > start them again" should be supported. That seems safe to do, and a
> > > real-world use case.
> >
> > I can agree with that, if we can declare it safe. You might need some
> > way to ensure it was shut down cleanly on both sides, I'm guessing.
> >
> > > Changing the system id to a random number would complicate this.
> > >
> > > (Horrible idea: maybe just adding 1 (= checksum version) to the system
> > > id would work?)
> >
> > Or any other way of changing the systemid in a predictable way would
> > also work, right? As long as it's done the same on both sides. And
> > that way it would look different to any system that *doesn't* know
> > what it means, which is probably a good thing.
>
> If we change the system identifier, we'll have to reset the WAL as well
> or otherwise we'll get "PANIC:  could not locate a valid checkpoint
> record" on startup.  So even if we do it predictably on both primary and
> standby I guess the standby would need to be re-cloned?
>
> So I think an option that skips that for people who know what they are
> doing with the streaming replication setup would be required, should we
> decide to bump the system identifier.
>

Ugh. I did not think of that one. But yes, the main idea there would be
that if you turn on checksums on the primary then you have to re-clone all
standbys. That's what happens if we change the system idenfier -- that's
why it's the "big hammer method".

But yeah, an option to avoid it could be one way to deal with it. If we
could find some safer way to handle it that'd be better, but otherwise
changing the sysid by default and having an option to turn it off could be
one way to deal with it.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Magnus Hagander
On Fri, Mar 15, 2019 at 1:49 AM Michael Paquier  wrote:

> On Thu, Mar 14, 2019 at 03:23:59PM +0100, Magnus Hagander wrote:
> > Are you suggesting we should support running with a master with checksums
> > on and a standby with checksums off in the same cluster? That seems..
> Very
> > fragile.
>
> Well, saying that it is supported is a too big term for that.  What I
> am saying is that the problems you are pointing out are not as bad as
> you seem to mean they are as long as an operator does not copy on-disk
> pages from one node to the other one.  Knowing that checksums apply
> only to pages flushed on disk on a local node, everything going
> through WAL for availability is actually able to work fine:
> - PITR
> - archive recovery.
> - streaming replication.
> Reading the code I understand that.  I have as well done some tests
> with a primary/standby configuration to convince myself, using pgbench
> on both nodes (read-write for the primary, read-only on the standby),
> with checkpoint (or restart point) triggered on each node every 20s.
> If one node has checksum enabled and the other checksum disabled, then
> I am not seeing any inconsistency.
>
> However, anything which does a physical copy of pages could get things
> easily messed up if one node has checksum disabled and the other
> enabled.  One such tool is pg_rewind.  If the promoted standby has
> checksums disabled (becoming the source), and the old master to rewind
> has checksums enabled, then the rewind could likely copy pages which
> have not their checksums set correctly, resulting in incorrect
> checksums on the old master.
>
> So yes, it is easy to mess up things, however this does not apply to
> all configurations.  The suggestion from Christoph to enable checksums
> on both nodes separately would work, and personally I find the
> suggestion to update the system ID after enabling or disabling
> checksums an over-engineered design because of the reasons in the
> first part of this email (it is technically doable to enable checksums
> with a minimum downtime and a failover), so my recommendation would be
> to document that when enabling checksums on one instance in a cluster,
> it should be applied to all instances as it could cause problems with
> any tools performing a physical copy of relation files or blocks.
>

As I said, that's a big hammer. I'm all for having a better solution. But I
don't think it's acceptable not to have *any* defense against it, given how
bad corruption it can lead to.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Paquier
On Fri, Mar 15, 2019 at 09:04:51AM +0100, Michael Banck wrote:
> ISTM this would not run fsync_parent_path() unless the first fsync fails
> which is not the intended use. I guess we need two ifs here?

Yes, let's do that.  Let's see if others have input to offer about the
patch.  This thread is gathering attention, which is good.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Banck
Hi,

Am Freitag, den 15.03.2019, 11:50 +0900 schrieb Michael Paquier:
> On Thu, Mar 14, 2019 at 04:26:20PM +0100, Michael Banck wrote:
> > Am Donnerstag, den 14.03.2019, 15:26 +0100 schrieb Magnus Hagander:
> > > One big-hammer method could be similar to what pg_upgrade does --
> > > temporarily rename away the controlfile so postgresql can't start, and
> > > when done, put it back.
> > 
> > That sounds like a good solution to me. I've made PoC patch for that,
> > see attached.
> 
> Indeed.  I did not know this trick from pg_upgrade.  We could just use
> the same.
> 
> > The only question is whether pg_checksums should try to move pg_control
> > back (i) on failure (ii) when interrupted?
> 
> Yes, we should have a callback on SIGINT and SIGTERM here which just
> moves back in place the control file if the temporary one exists.  I
> have been able to grab some time to incorporate the feedback gathered
> on this thread, and please find attached a new version of the patch to
> add --enable/--disable.  

Thanks!

One thing stood out to me while quickly looking over it:

+   /*
+    * Flush the control file and its parent path to make the change
+    * durable.
+    */
+   if (fsync_fname(controlfile_path, false, progname) != 0 ||
+   fsync_parent_path(controlfile_path, progname) != 0)
+   {
+   /* errors are already logged on failure */
+   exit(1);
+   }

ISTM this would not run fsync_parent_path() unless the first fsync fails
which is not the intended use. I guess we need two ifs here?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Michael Paquier
On Fri, Mar 15, 2019 at 11:50:27AM +0900, Michael Paquier wrote:
> - Rename the control file when beginning the enabling operation, with
> a callback to rename the file back if the operation is interrupted.
> 
> Does this make sense?

Just before I forget...  Please note that this handles interruptions
but not failures, based on the assumption that on failures we can know
that the system was working on its checksums thanks to the temporary
control file so that's useful for debugging in my opinion.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Michael Paquier
On Thu, Mar 14, 2019 at 04:26:20PM +0100, Michael Banck wrote:
> Am Donnerstag, den 14.03.2019, 15:26 +0100 schrieb Magnus Hagander:
>> One big-hammer method could be similar to what pg_upgrade does --
>> temporarily rename away the controlfile so postgresql can't start, and
>> when done, put it back.
> 
> That sounds like a good solution to me. I've made PoC patch for that,
> see attached.

Indeed.  I did not know this trick from pg_upgrade.  We could just use
the same.

> The only question is whether pg_checksums should try to move pg_control
> back (i) on failure (ii) when interrupted?

Yes, we should have a callback on SIGINT and SIGTERM here which just
moves back in place the control file if the temporary one exists.  I
have been able to grab some time to incorporate the feedback gathered
on this thread, and please find attached a new version of the patch to
add --enable/--disable.  The main changes are:
- When enabling checksums, fsync first the data directory, and at the
end then update/flush the control file and its parent folder as Fabien
has reported.
- When disabling checksums, only work on the control file, as Fabien
has also reported.
- Rename the control file when beginning the enabling operation, with
a callback to rename the file back if the operation is interrupted.

Does this make sense?
--
Michael
From 2ebb032e7bea22829396e88ff9cc1b52f1b754d4 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 13 Mar 2019 11:12:53 +0900
Subject: [PATCH] Add options to enable and disable checksums in pg_checksums

An offline cluster can now work with more modes in pg_checksums:
- --enable can enable checksums in a cluster, updating all blocks with a
correct checksum, and update the control file at the end.
- --disable can disable checksums in a cluster, updating the the control
file.
- --check is an extra option able to verify checksums for a cluster.

When using --disable, only the control file is updated and then
flushed.  When using --enable, the process gets more complicated as the
operation can be long:
- Rename the control file to a temporary name, to prevent a parallel
startup of Postgres.
- Scan all files and update their checksums.
- Rename back the control file.
- Flush the data directory.
- Update the control file and then flush it, to make the change
durable.
If the operation is interrupted, the control file gets moved back in
place.

If no mode is specified in the options, then --check is used for
compatibility with older versions of pg_verify_checksums (now renamed to
pg_checksums in v12).

Author: Michael Banck, Michael Paquier
Reviewed-by: Fabien Coelho, Magnus Hagander, Sergei Kornilov
Discussion: https://postgr.es/m/20181221201616.gd4...@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/pg_checksums.sgml|  54 -
 src/bin/pg_checksums/pg_checksums.c   | 287 +++---
 src/bin/pg_checksums/t/002_actions.pl |  76 +--
 src/tools/pgindent/typedefs.list  |   1 +
 4 files changed, 370 insertions(+), 48 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 6a47dda683..dc41553bc4 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -16,7 +16,7 @@ PostgreSQL documentation
 
  
   pg_checksums
-  verify data checksums in a PostgreSQL database cluster
+  enable, disable or check data checksums in a PostgreSQL database cluster
  
 
  
@@ -36,11 +36,24 @@ PostgreSQL documentation
  
   Description
   
-   pg_checksums verifies data checksums in a
-   PostgreSQL cluster.  The server must be shut
-   down cleanly before running pg_checksums.
-   The exit status is zero if there are no checksum errors, otherwise nonzero.
+   pg_checksums checks, enables or disables data
+   checksums in a PostgreSQL cluster.  The server
+   must be shut down cleanly before running
+   pg_checksums. The exit status is zero if there
+   are no checksum errors when checking them, and nonzero if at least one
+   checksum failure is detected. If enabling or disabling checksums, the
+   exit status is nonzero if the operation failed.
   
+
+  
+   Checking checksums requires to scan every file holding them in the data
+   folder.  Disabling checksums requires only an update of the file
+   pg_control.  Enabling checksums first renames
+   the file pg_control to
+   pg_control.pg_checksums_in_progress to prevent
+   a parallel startup of the cluster, then it updates all files with
+   checksums, and it finishes by renaming and updating
+   pg_control to mark checksums as enabled.
  
 
  
@@ -60,6 +73,37 @@ PostgreSQL documentation
   
  
 
+ 
+  -c
+  --check
+  
+   
+Checks checksums. This is the default mode if nothing else is
+specified.
+   
+  
+ 
+
+ 
+  -d
+  --disable
+  
+   
+Disables checksums.
+   
+  
+ 
+
+ 
+  -e
+  --enable
+  
+   
+Enables checksums.
+   

Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Michael Paquier
On Thu, Mar 14, 2019 at 03:23:59PM +0100, Magnus Hagander wrote:
> Are you suggesting we should support running with a master with checksums
> on and a standby with checksums off in the same cluster? That seems.. Very
> fragile.

Well, saying that it is supported is a too big term for that.  What I
am saying is that the problems you are pointing out are not as bad as
you seem to mean they are as long as an operator does not copy on-disk
pages from one node to the other one.  Knowing that checksums apply
only to pages flushed on disk on a local node, everything going
through WAL for availability is actually able to work fine:
- PITR
- archive recovery.
- streaming replication.
Reading the code I understand that.  I have as well done some tests
with a primary/standby configuration to convince myself, using pgbench
on both nodes (read-write for the primary, read-only on the standby),
with checkpoint (or restart point) triggered on each node every 20s.
If one node has checksum enabled and the other checksum disabled, then
I am not seeing any inconsistency.

However, anything which does a physical copy of pages could get things
easily messed up if one node has checksum disabled and the other
enabled.  One such tool is pg_rewind.  If the promoted standby has
checksums disabled (becoming the source), and the old master to rewind
has checksums enabled, then the rewind could likely copy pages which
have not their checksums set correctly, resulting in incorrect
checksums on the old master.

So yes, it is easy to mess up things, however this does not apply to
all configurations.  The suggestion from Christoph to enable checksums
on both nodes separately would work, and personally I find the
suggestion to update the system ID after enabling or disabling
checksums an over-engineered design because of the reasons in the
first part of this email (it is technically doable to enable checksums
with a minimum downtime and a failover), so my recommendation would be
to document that when enabling checksums on one instance in a cluster,
it should be applied to all instances as it could cause problems with
any tools performing a physical copy of relation files or blocks.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Michael Banck
Hi,

Am Donnerstag, den 14.03.2019, 15:32 +0100 schrieb Magnus Hagander:
> On Thu, Mar 14, 2019 at 3:28 PM Christoph Berg  wrote:
> > Re: Magnus Hagander 2019-03-14 
> > 
> > > Are you suggesting we should support running with a master with checksums
> > > on and a standby with checksums off in the same cluster? That seems.. Very
> > > fragile.
> > 
> > The case "shut down master and standby, run pg_checksums on both, and
> > start them again" should be supported. That seems safe to do, and a
> > real-world use case.
> 
> I can agree with that, if we can declare it safe. You might need some
> way to ensure it was shut down cleanly on both sides, I'm guessing. 
> 
> > Changing the system id to a random number would complicate this.
> > 
> > (Horrible idea: maybe just adding 1 (= checksum version) to the system
> > id would work?)
> 
> Or any other way of changing the systemid in a predictable way would
> also work, right? As long as it's done the same on both sides. And
> that way it would look different to any system that *doesn't* know
> what it means, which is probably a good thing.

If we change the system identifier, we'll have to reset the WAL as well
or otherwise we'll get "PANIC:  could not locate a valid checkpoint
record" on startup.  So even if we do it predictably on both primary and
standby I guess the standby would need to be re-cloned?

So I think an option that skips that for people who know what they are
doing with the streaming replication setup would be required, should we
decide to bump the system identifier.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Michael Banck
Hi,

Am Donnerstag, den 14.03.2019, 15:26 +0100 schrieb Magnus Hagander:
> Given that the failure is data corruption, I don't think big fat
> warning is enough. We should really make it impossible to start up the
> postmaster by mistake during the checksum generation. People don't
> read the documentation until it's too late. And it might not even be
> under their control - some automated tool might go in and try to start
> postgres, and boom, corruption.

I guess you're right.

> One big-hammer method could be similar to what pg_upgrade does --
> temporarily rename away the controlfile so postgresql can't start, and
> when done, put it back.

That sounds like a good solution to me. I've made PoC patch for that,
see attached.

The only question is whether pg_checksums should try to move pg_control
back (i) on failure (ii) when interrupted?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutzFrom 9b1471afee485a96c32e393989ca4eb0aed52f88 Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Thu, 14 Mar 2019 16:02:49 +0100
Subject: [PATCH] Rename away pg_control while enabling checksums.

In order to prevent that the cluster is accidently started during the
operation, the pg_control file is renamed to
pg_control.pg_checksums_in_progress during the operation and renamed back at
the end.
---
 doc/src/sgml/ref/pg_checksums.sgml  |  5 +++--
 src/bin/pg_checksums/pg_checksums.c | 36 
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index c3ccbf4eb7..91dfd3515a 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -48,8 +48,9 @@ PostgreSQL documentation
   
While checking or enabling checksums needs to scan or write every file in
the cluster, disabling will only update the file
-   pg_control.
-  
+   pg_control. During enabling,
+   the pg_control  file is renamed in order to
+   prevent accidentally starting the cluster.
  
 
  
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 852ddafc94..590fb925e8 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -55,6 +55,12 @@ typedef enum
 #define PG_TEMP_FILES_DIR "pgsql_tmp"
 #define PG_TEMP_FILE_PREFIX "pgsql_tmp"
 
+#ifndef WIN32
+#define pg_mv_file			rename
+#else
+#define pg_mv_file			pgrename
+#endif
+
 static PgChecksumMode mode = PG_MODE_CHECK;
 
 static const char *progname;
@@ -88,6 +94,7 @@ usage(void)
  */
 static const char *const skip[] = {
 	"pg_control",
+	"pg_control.pg_checksums_in_progress",
 	"pg_filenode.map",
 	"pg_internal.init",
 	"PG_VERSION",
@@ -306,6 +313,8 @@ main(int argc, char *argv[])
 	};
 
 	char	   *DataDir = NULL;
+	char		old_controlfile_path[MAXPGPATH];
+	char		new_controlfile_path[MAXPGPATH];
 	int			c;
 	int			option_index;
 	bool		crc_ok;
@@ -440,6 +449,23 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/*
+	 * Disable cluster by renaming pg_control when enabling checksums so
+	 * that it cannot be started by accident during the operation
+	 */
+	if (mode == PG_MODE_ENABLE)
+	{
+		snprintf(old_controlfile_path, sizeof(old_controlfile_path),
+ "%s/global/pg_control", DataDir);
+		snprintf(new_controlfile_path, sizeof(new_controlfile_path),
+ "%s/global/pg_control.pg_checksums_in_progress", DataDir);
+		if (pg_mv_file(old_controlfile_path, new_controlfile_path) != 0)
+		{
+			fprintf(stderr, _("%s: unable to rename %s to %s.\n"), progname, old_controlfile_path, new_controlfile_path);
+			exit(1);
+		}
+	}
+
 	/* Operate on all files if checking or enabling checksums */
 	if (mode == PG_MODE_CHECK || mode == PG_MODE_ENABLE)
 	{
@@ -466,6 +492,16 @@ main(int argc, char *argv[])
 	 */
 	if (mode == PG_MODE_ENABLE || mode == PG_MODE_DISABLE)
 	{
+
+		/* Move back pg_control */
+		if (mode == PG_MODE_ENABLE)
+		{
+			if (pg_mv_file(new_controlfile_path, old_controlfile_path) != 0)
+			{
+fprintf(stderr, _("%s: unable to rename %s to %s.\n"), progname, new_controlfile_path, old_controlfile_path);
+exit(1);
+			}
+		}
 		/* Update control file */
 		ControlFile->data_checksum_version =
 			(mode == PG_MODE_ENABLE) ? PG_DATA_CHECKSUM_VERSION : 0;
-- 
2.11.0



Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Magnus Hagander
On Thu, Mar 14, 2019 at 3:28 PM Christoph Berg  wrote:

> Re: Magnus Hagander 2019-03-14  zmb8qck7ndmchey5...@mail.gmail.com>
> > Are you suggesting we should support running with a master with checksums
> > on and a standby with checksums off in the same cluster? That seems..
> Very
> > fragile.
>
> The case "shut down master and standby, run pg_checksums on both, and
> start them again" should be supported. That seems safe to do, and a
> real-world use case.
>

I can agree with that, if we can declare it safe. You might need some way
to ensure it was shut down cleanly on both sides, I'm guessing.


Changing the system id to a random number would complicate this.
>
> (Horrible idea: maybe just adding 1 (= checksum version) to the system
> id would work?)
>

Or any other way of changing the systemid in a predictable way would also
work, right? As long as it's done the same on both sides. And that way it
would look different to any system that *doesn't* know what it means, which
is probably a good thing.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Christoph Berg
Re: Magnus Hagander 2019-03-14 

> Are you suggesting we should support running with a master with checksums
> on and a standby with checksums off in the same cluster? That seems.. Very
> fragile.

The case "shut down master and standby, run pg_checksums on both, and
start them again" should be supported. That seems safe to do, and a
real-world use case.

Changing the system id to a random number would complicate this.

(Horrible idea: maybe just adding 1 (= checksum version) to the system
id would work?)

Christoph



Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Magnus Hagander
On Thu, Mar 14, 2019 at 5:39 AM Michael Paquier  wrote:

> On Wed, Mar 13, 2019 at 08:56:33PM +0900, Michael Paquier wrote:
> > On Wed, Mar 13, 2019 at 12:09:24PM +0100, Michael Banck wrote:
> > > The attached patch should do the above, on top of Michael's last
> > > patchset.
> >
> > What you are doing here looks like a good defense in itself.
>
> More thoughts on that, and here is a short summary of the thread.
>
> +   /* Check if control file has changed */
> +   if (controlfile_last_updated != ControlFile->time)
> +   {
> +   fprintf(stderr, _("%s: control file has changed since
> startup\n"), progname);
> +   exit(1);
> +   }
> Actually, under the conditions discussed on this thread that Postgres
> is started in parallel of pg_checksums, imagine the following
> scenario:
> - pg_checksums starts to enable checksums, it reads a block to
> calculate its checksum, then calculates it.
> - Postgres has been started in parallel, writes the same block.
> - pg_checksums finishes the block calculation, writes back the block
> it has just read.
> - Postgres stops, some data is lost.
> - At the end of pg_checksums, we complain that the control file has
> been updated since the start of pg_checksums.
> I think that we should be way more noisy about this error message
> document properly that Postgres should not be started while checksums
> are enabled.  Basically, I guess that it should mention that there is
> a risk of corruption because of this parallel operation.
>
> Hence, based on what I could read on this thread, we'd like to have
> the following things added to the core patch set:
> 1) When enabling checksums, fsync the data folder.  Then update the
> control file, and finally fsync the control file (+ flush of the
> parent folder to make the whole durable).  This way a host crash never
> actually means that we finish in an inconsistent state.
> 2) When checksums are disabled, update the control file, then fsync
> it + its parent folder.
> 3) Add tracking of the control file data, and complain loudly before
> trying to update the file if there are any inconsistencies found.
> 4) Document with a big-fat-red warning that postgres should not be
> worked on while the tool is enabling or disabling checksums.
>

Given that the failure is data corruption, I don't think big fat warning is
enough. We should really make it impossible to start up the postmaster by
mistake during the checksum generation. People don't read the documentation
until it's too late. And it might not even be under their control - some
automated tool might go in and try to start postgres, and boom, corruption.

One big-hammer method could be similar to what pg_upgrade does --
temporarily rename away the controlfile so postgresql can't start, and when
done, put it back.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Magnus Hagander
On Thu, Mar 14, 2019 at 1:23 AM Michael Paquier  wrote:

> On Wed, Mar 13, 2019 at 12:24:21PM +0100, Magnus Hagander wrote:
> > Enabling or disabling the checksums offline on the master quite clearly
> > requires a rebuild of the standby, there is no other way (this is one of
> > the reasons for the online enabling in that patch, so I still hope we can
> > get that done -- but not for this version).
>
> I am curious to understand why this would require a rebuild of the
> standby.  Technically FPWs don't update the checksum of a page when it
> is WAL-logged, so even if a primary and a standby don't agree on the
> checksum configuration, it is the timing where pages are flushed in
> the local instance which counts for checksum correctness.
>

Are you suggesting we should support running with a master with checksums
on and a standby with checksums off in the same cluster? That seems.. Very
fragile.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Michael Banck
Hi,

Am Mittwoch, den 13.03.2019, 17:54 +0100 schrieb Magnus Hagander:
> On Wed, Mar 13, 2019 at 4:51 PM Michael Banck  
> wrote:
> > Am Mittwoch, den 13.03.2019, 12:43 +0100 schrieb Magnus Hagander:
> > > I think this is dangerous enough that it needs to be enforced and not
> > > documented.
> > 
> > Changing the cluster ID might have some other side-effects, I think
> > there are several cloud-native 3rd party solutions that use the cluster
> > ID as some kind of unique identifier for an instance. It might not be an
> > issue in practise, but then again, it might break other stuff down the
> > road.
> 
> Well, whatever we do they have to update, right? 

Yeah, but I am saying their orchestrators might get confused about where
the old instance went and what this new thing with a totally different
systemid is and lose the connection between the two. 

Maybe that is a feature and not a bug.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Paquier
On Wed, Mar 13, 2019 at 08:56:33PM +0900, Michael Paquier wrote:
> On Wed, Mar 13, 2019 at 12:09:24PM +0100, Michael Banck wrote:
> > The attached patch should do the above, on top of Michael's last
> > patchset.
> 
> What you are doing here looks like a good defense in itself.

More thoughts on that, and here is a short summary of the thread.

+   /* Check if control file has changed */
+   if (controlfile_last_updated != ControlFile->time)
+   {
+   fprintf(stderr, _("%s: control file has changed since startup\n"), 
progname);
+   exit(1);
+   }
Actually, under the conditions discussed on this thread that Postgres
is started in parallel of pg_checksums, imagine the following
scenario:
- pg_checksums starts to enable checksums, it reads a block to
calculate its checksum, then calculates it.
- Postgres has been started in parallel, writes the same block.
- pg_checksums finishes the block calculation, writes back the block
it has just read.
- Postgres stops, some data is lost.
- At the end of pg_checksums, we complain that the control file has
been updated since the start of pg_checksums.
I think that we should be way more noisy about this error message
document properly that Postgres should not be started while checksums
are enabled.  Basically, I guess that it should mention that there is
a risk of corruption because of this parallel operation.

Hence, based on what I could read on this thread, we'd like to have
the following things added to the core patch set:
1) When enabling checksums, fsync the data folder.  Then update the
control file, and finally fsync the control file (+ flush of the
parent folder to make the whole durable).  This way a host crash never
actually means that we finish in an inconsistent state.
2) When checksums are disabled, update the control file, then fsync
it + its parent folder.
3) Add tracking of the control file data, and complain loudly before
trying to update the file if there are any inconsistencies found.
4) Document with a big-fat-red warning that postgres should not be
worked on while the tool is enabling or disabling checksums.

There is a part discussed about standbys and primaries with not the
same checksum settings, but I commented on that in [1].

There is a secondary bug fix to prevent the tool to run if the data
folder has been initialized with a block size different than what
pg_checksums has been compiled with in [2].  The patch looks good,
still the error message could be better per my lookup.

[1]: https://www.postgresql.org/message-id/20190314002342.gc3...@paquier.xyz
[2]: https://www.postgresql.org/message-id/20190313224742.ga3...@paquier.xyz

Am I missing something?
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Paquier
On Wed, Mar 13, 2019 at 12:24:21PM +0100, Magnus Hagander wrote:
> Enabling or disabling the checksums offline on the master quite clearly
> requires a rebuild of the standby, there is no other way (this is one of
> the reasons for the online enabling in that patch, so I still hope we can
> get that done -- but not for this version).

I am curious to understand why this would require a rebuild of the
standby.  Technically FPWs don't update the checksum of a page when it
is WAL-logged, so even if a primary and a standby don't agree on the
checksum configuration, it is the timing where pages are flushed in
the local instance which counts for checksum correctness.

> You mean if the backend and pg_checksums is built with different blocksize?
> Yeah, that sounds like something which is a cheap check and should be done.

Yes, we should check after that, checksum calculation uses BLCKSZ with
a hardcoded value, so a mismatch would cause computation failures.  It
could be possible to not have this restriction if we made the block
size an argument of the checksum calculation though.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Paquier
On Wed, Mar 13, 2019 at 02:43:39PM +0300, Sergei Kornilov wrote:
> Seems good. And I think we need backpath this check to pg11. similar
> to cross-version compatibility checks 

+fprintf(stderr, _("%s: data directory block size %d is different to 
compiled-in block size %d.\n"),
+progname, ControlFile->blcksz, BLCKSZ);
The error message looks grammatically a bit weird to me.  What about
the following?  Say:
"database block size of %u is different from supported block size of
%u."
Better ideas are welcome.

Please note that hose integers are unsigned by the way.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Magnus Hagander
On Wed, Mar 13, 2019 at 4:51 PM Michael Banck 
wrote:

> Hi,
>
> Am Mittwoch, den 13.03.2019, 12:43 +0100 schrieb Magnus Hagander:
> > I think this is dangerous enough that it needs to be enforced and not
> > documented.
>
> Changing the cluster ID might have some other side-effects, I think
> there are several cloud-native 3rd party solutions that use the cluster
> ID as some kind of unique identifier for an instance. It might not be an
> issue in practise, but then again, it might break other stuff down the
> road.
>

Well, whatever we do they have to update, right? If we're not changing it,
then we're basically saying that it's (systemid, checksums) that is the
identifier of the cluster, not just systemid. They'd have to go around and
check each node individually for the configuration and not just use
systemid anyway, so what's the actual win?


Another possibility would be to extend the replication protocol's
> IDENTIFY_SYSTEM command to also report the checksum version so that the
> standby can check against the local control file on startup. But I am
> not sure we can easily extend IDENTIFY_SYSTEM this way nor whether we
> should for this rather corner-casey thing?
>

We could, but is it really a win in those scenarios? Vs just making the
systemid different? With systemid being different it's obvious that
something needs to be done. If it's not then at the best, if we check it in
the standby startup, the standby won't start. But people can still end up
with things like unusuable/corrupt backups for example.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Magnus Hagander
On Wed, Mar 13, 2019 at 4:46 PM Michael Banck 
wrote:

> Hi,
>
> Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander:
> > On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov  wrote:
> > > One new question from me: how about replication?
> > > Case: primary+replica, we shut down primary and enable checksum, and
> > > "started streaming WAL from primary" without any issue. I have
> > > master with checksums, but replica without.
> > > Or cluster with checksums, then disable checksums on primary, but
> > > standby think we have checksums.
> >
> > Enabling or disabling the checksums offline on the master quite
> > clearly requires a rebuild of the standby, there is no other way
>
> What about shutting down both and running pg_checksums --enable on the
> standby as well?
>

That sounds pretty fragile to me. But if we can prove that the user has
done things in the right order, sure. But how can we do that in an offline
process? what if the user just quickly restarted the primary note after the
standby had been shut down? We'll need to somehow validate it across the
nodes..
-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Banck
Hi,

Am Mittwoch, den 13.03.2019, 12:43 +0100 schrieb Magnus Hagander:
> I think this is dangerous enough that it needs to be enforced and not
> documented.

Changing the cluster ID might have some other side-effects, I think
there are several cloud-native 3rd party solutions that use the cluster
ID as some kind of unique identifier for an instance. It might not be an
issue in practise, but then again, it might break other stuff down the
road.

Another possibility would be to extend the replication protocol's 
IDENTIFY_SYSTEM command to also report the checksum version so that the
standby can check against the local control file on startup. But I am
not sure we can easily extend IDENTIFY_SYSTEM this way nor whether we
should for this rather corner-casey thing?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Banck
Hi,

Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander:
> On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov  wrote:
> > One new question from me: how about replication?
> > Case: primary+replica, we shut down primary and enable checksum, and
> > "started streaming WAL from primary" without any issue. I have
> > master with checksums, but replica without.
> > Or cluster with checksums, then disable checksums on primary, but
> > standby think we have checksums.
> 
> Enabling or disabling the checksums offline on the master quite
> clearly requires a rebuild of the standby, there is no other way

What about shutting down both and running pg_checksums --enable on the
standby as well?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Paquier
On Wed, Mar 13, 2019 at 02:43:39PM +0300, Sergei Kornilov wrote:
> Seems good. And I think we need backpath this check to pg11. similar
> to cross-version compatibility checks 

Good point raised, a backpatch looks adapted.  It would be nice to get
into something more dynamic, but pg_checksum_block() uses directly
BLCKSZ :(
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Paquier
On Wed, Mar 13, 2019 at 12:09:24PM +0100, Michael Banck wrote:
> The attached patch should do the above, on top of Michael's last
> patchset.

What you are doing here looks like a good defense in itself.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Sergei Kornilov
Hi,

>>  > Also we support ./configure --with-blocksize=(not equals 8)? make
>>  > check on HEAD fails for me. If we support this - i think we need
>>  > recheck BLCKSZ between compiled pg_checksum and used in PGDATA
>>
>>  You mean if the backend and pg_checksums is built with different
>>  blocksize? Yeah, that sounds like something which is a cheap check and
>>  should be done.
>
> I've been doing that in my pg_checksums fork for a while (as it further
> removed from the Postgres binaries) but yeah we should check for that as
> well in pg_checksums, see attached patch.

Seems good. And I think we need backpath this check to pg11. similar to 
cross-version compatibility checks

regards, Sergei



Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Magnus Hagander
On Wed, Mar 13, 2019 at 12:40 PM Sergei Kornilov  wrote:

> Hi
>
> >> One new question from me: how about replication?
> >> Case: primary+replica, we shut down primary and enable checksum, and
> "started streaming WAL from primary" without any issue. I have master with
> checksums, but replica without.
> >> Or cluster with checksums, then disable checksums on primary, but
> standby think we have checksums.
> >
> > Enabling or disabling the checksums offline on the master quite clearly
> requires a rebuild of the standby, there is no other way (this is one of
> the reasons for the online enabling in that patch, so I still hope we can
> get that done -- but not for this version).
>
> I mean this should be at least documented.
> Change system id... Maybe is reasonable
>

I think this is dangerous enough that it needs to be enforced and not
documented.

Most people who care about checksums are also going to be having either
replication or backup...


>> Also we support ./configure --with-blocksize=(not equals 8)? make check
> on HEAD fails for me. If we support this - i think we need recheck BLCKSZ
> between compiled pg_checksum and used in PGDATA
> >
> > You mean if the backend and pg_checksums is built with different
> blocksize? Yeah, that sounds like something which is a cheap check and
> should be done.
>
> Yep
>

This one I could more live with it only being a documented problem rather
than enforced, but it also seems very simple to enforce.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Sergei Kornilov
Hi

>> One new question from me: how about replication?
>> Case: primary+replica, we shut down primary and enable checksum, and 
>> "started streaming WAL from primary" without any issue. I have master with 
>> checksums, but replica without.
>> Or cluster with checksums, then disable checksums on primary, but standby 
>> think we have checksums.
>
> Enabling or disabling the checksums offline on the master quite clearly 
> requires a rebuild of the standby, there is no other way (this is one of the 
> reasons for the online enabling in that patch, so I still hope we can get 
> that done -- but not for this version).

I mean this should be at least documented.
Change system id... Maybe is reasonable

>> Also we support ./configure --with-blocksize=(not equals 8)? make check on 
>> HEAD fails for me. If we support this - i think we need recheck BLCKSZ 
>> between compiled pg_checksum and used in PGDATA
>
> You mean if the backend and pg_checksums is built with different blocksize? 
> Yeah, that sounds like something which is a cheap check and should be done.

Yep

regards, Sergei



Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Banck
Hi,

Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander:
> > Also we support ./configure --with-blocksize=(not equals 8)? make
> > check on HEAD fails for me. If we support this - i think we need
> > recheck BLCKSZ between compiled pg_checksum and used in PGDATA
> 
> You mean if the backend and pg_checksums is built with different
> blocksize? Yeah, that sounds like something which is a cheap check and
> should be done. 

I've been doing that in my pg_checksums fork for a while (as it further
removed from the Postgres binaries) but yeah we should check for that as
well in pg_checksums, see attached patch.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutzFrom f942136e09cd54b1032c7c5d9b4f3305e7dc043f Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Wed, 13 Mar 2019 12:27:44 +0100
Subject: [PATCH] Check data directory block size

---
 src/bin/pg_checksums/pg_checksums.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index afca6cf027..dfd522ca6a 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -442,6 +442,17 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/*
+	 * Check that the PGDATA blocksize is the same as the one pg_checksums
+	 * was compiled against (BLCKSZ).
+	 */
+	if (ControlFile->blcksz != BLCKSZ)
+	{
+		fprintf(stderr, _("%s: data directory block size %d is different to compiled-in block size %d.\n"),
+progname, ControlFile->blcksz, BLCKSZ);
+		exit(1);
+	}
+
 	/* Save time of last control file modification */
 	controlfile_last_updated = ControlFile->time;
 
-- 
2.11.0



Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Magnus Hagander
On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov  wrote:

> Hi
>
> One new question from me: how about replication?
> Case: primary+replica, we shut down primary and enable checksum, and
> "started streaming WAL from primary" without any issue. I have master with
> checksums, but replica without.
> Or cluster with checksums, then disable checksums on primary, but standby
> think we have checksums.
>

Enabling or disabling the checksums offline on the master quite clearly
requires a rebuild of the standby, there is no other way (this is one of
the reasons for the online enabling in that patch, so I still hope we can
get that done -- but not for this version).

You would have the same with PITR backups for example. And especially if
you have some tool that does block or segment level differential.

Of course, we have to make sure that this actually fails.

I wonder if we should bring out the big hammer and actually change the
system id in pg_control when checksums are enabled/disabled by this tool?
That should make it clear to any tool that it's changed.


Also we support ./configure --with-blocksize=(not equals 8)? make check on
> HEAD fails for me. If we support this - i think we need recheck BLCKSZ
> between compiled pg_checksum and used in PGDATA
>

You mean if the backend and pg_checksums is built with different blocksize?
Yeah, that sounds like something which is a cheap check and should be done.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Banck
Hi,

Am Mittwoch, den 13.03.2019, 11:47 +0100 schrieb Magnus Hagander:
> On Wed, Mar 13, 2019 at 11:41 AM Michael Banck  
> wrote:
> > I propose we re-read the control file for the enable case after we
> > finished operating on all files and (i) check the instance is still
> > offline and (ii) update the checksums version from there. That should be
> > a small but worthwhile change that could be done anyway.
> 
> In (i) you need to also check that is' not offline *again*. Somebody
> could start *and* stop the database while pg_checksums is running. But
> that should hopefully be enough to check the time field?

Good point.

> > Also, I suggest to maybe add a notice in verbose mode that we are
> > syncing the data directory - otherwise the user might wonder what's
> > going on at 100% done, though I haven't seen a large delay in my tests
> > so far.
> 
> Seems like a good idea -- there certainly could be a substantial delay
> there depending on data size and underlying storage.

The attached patch should do the above, on top of Michael's last
patchset.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutzFrom 731c48d21b0adfa74e7ce1e05b2d8f1146b9e3cf Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Wed, 13 Mar 2019 12:05:34 +0100
Subject: [PATCH] Guard against concurrent cluster changes when enabling
 checksums.

Re-read the control file after operating on all files in order to check whether
the instance is still shutdown and the control file still has the same
modification timestamp.

In passing, add a note about syncing the data directory in verbose mode.
---
 src/bin/pg_checksums/pg_checksums.c | 38 ++---
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 0e464299f3..afca6cf027 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -309,6 +309,7 @@ main(int argc, char *argv[])
 	int			c;
 	int			option_index;
 	bool		crc_ok;
+	pg_time_t	controlfile_last_updated;
 
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_checksums"));
 
@@ -399,7 +400,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	/* Check if cluster is running */
+	/* Get control file data */
 	ControlFile = get_controlfile(DataDir, progname, _ok);
 	if (!crc_ok)
 	{
@@ -414,6 +415,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/* Check if cluster is running */
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{
@@ -440,6 +442,9 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/* Save time of last control file modification */
+	controlfile_last_updated = ControlFile->time;
+
 	/* Operate on all files if checking or enabling checksums */
 	if (mode == PG_MODE_CHECK || mode == PG_MODE_ENABLE)
 	{
@@ -461,17 +466,44 @@ main(int argc, char *argv[])
 	}
 
 	/*
-	 * Finally update the control file, flushing the data directory at the
-	 * end.
+	 * Finally update the control file after checking the cluster is still
+	 * offline and its control file has not changed, flushing the data
+	 * directory at the end.
 	 */
 	if (mode == PG_MODE_ENABLE || mode == PG_MODE_DISABLE)
 	{
+		ControlFile = get_controlfile(DataDir, progname, _ok);
+		if (!crc_ok)
+		{
+			fprintf(stderr, _("%s: pg_control CRC value is incorrect\n"), progname);
+			exit(1);
+		}
+
+		/* Check if cluster is running */
+		if (ControlFile->state != DB_SHUTDOWNED &&
+		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
+		{
+			fprintf(stderr, _("%s: cluster no longer shut down\n"), progname);
+			exit(1);
+		}
+
+		/* Check if control file has changed */
+		if (controlfile_last_updated != ControlFile->time)
+		{
+			fprintf(stderr, _("%s: control file has changed since startup\n"), progname);
+			exit(1);
+		}
+
 		/* Update control file */
 		ControlFile->data_checksum_version =
 			(mode == PG_MODE_ENABLE) ? PG_DATA_CHECKSUM_VERSION : 0;
 		update_controlfile(DataDir, progname, ControlFile);
 		if (do_sync)
+		{
+			if (verbose && mode == PG_MODE_ENABLE)
+printf(_("Syncing data directory\n"));
 			fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+		}
 		if (verbose)
 			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
 		if (mode == PG_MODE_ENABLE)
-- 
2.11.0



Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Fabien COELHO


Hallo Michael,


I propose we re-read the control file for the enable case after we
finished operating on all files and (i) check the instance is still
offline and (ii) update the checksums version from there. That should be
a small but worthwhile change that could be done anyway.


That looks like a simple but mostly effective guard.


Another option would be to add a new feature which reliably blocks an
instance from starting up due to maintenance - either a new control file
field, some message in postmaster.pid (like "pg_checksums maintenance in
progress") that would prevent pg_ctl or postgres/postmaster from
starting up like 'FATAL:  bogus data in lock file "postmaster.pid":
"pg_checksums in progress' or some other trigger file.


I think that a clear cluster-locking can-be-overriden-if-needed 
shared-between-commands mechanism would be a good thing (tm), although it 
requires some work.


My initial suggestion was to update the control file with an appropriate 
state, eg some general "admin command in progress", but I understood that 
it is rejected, and for another of your patch it seems that the 
"postmaster.pid" file is the right approach. Fine with me, the point is 
that it should be effective and consistent accross all relevant commands.


A good point about the "postmaster.pid" trick, when it does not contain 
the posmaster pid, is that overriding is as simple as "rm postmaster.pid".


It could be possible to reach a state where the control file has 
checksums enabled and some blocks are not correctly synced, still you 
would notice rather quickly if the server is in an incorrect state at 
the follow-up startup.


Would you?  I think I'm with Fabien on this one and it seems worthwhile
to run fsync_pgdata() before and after update_controlfile() - the second
one should be really quick anyway. 


Note that fsync_pgdata is kind of heavy, it recurses everywhere. I think 
that a simple fsync on the control file only is enough.



Also, I suggest to maybe add a notice in verbose mode that we are
syncing the data directory - otherwise the user might wonder what's
going on at 100% done, though I haven't seen a large delay in my tests
so far.


I agree, as it might not be cheap.

--
Fabien.

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Sergei Kornilov
Hi

One new question from me: how about replication?
Case: primary+replica, we shut down primary and enable checksum, and "started 
streaming WAL from primary" without any issue. I have master with checksums, 
but replica without.
Or cluster with checksums, then disable checksums on primary, but standby think 
we have checksums.

Also we support ./configure --with-blocksize=(not equals 8)? make check on HEAD 
fails for me. If we support this - i think we need recheck BLCKSZ between 
compiled pg_checksum and used in PGDATA

regards, Sergei



Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Magnus Hagander
On Wed, Mar 13, 2019 at 11:41 AM Michael Banck 
wrote:

> Am Mittwoch, den 13.03.2019, 18:31 +0900 schrieb Michael Paquier:
> > On Wed, Mar 13, 2019 at 10:08:33AM +0100, Fabien COELHO wrote:
> > > I'm not sure of the punctuation logic on the help line: the first
> sentence
> > > does not end with a ".". I could not find an instance of this style in
> other
> > > help on pg commands. I'd suggest "check data checksums (default)"
> would work
> > > around and be more in line with other commands help.
> >
> > Good idea, let's do that.
> >
> > > I slowed down pg_checksums by adding a 0.1s sleep when scanning a new
> file,
> > > then started a "pg_checksums --enable" on a stopped cluster, then
> started
> > > the cluster while the enabling was in progress, then connected and
> updated
> > > data.
> >
> > Well, yes, don't do that.  You can get into the same class of problems
> > while running pg_rewind, pg_basebackup or even pg_resetwal once the
> > initial control file check is done for each one of these tools.
> >
> > > I do not think it is a good thing that two commands can write to the
> data
> > > directory at the same time, really.
> >
> > We don't prevent either a pg_resetwal and a pg_basebackup to run in
> > parallel.  That would be...  Interesting.
>
> But does pg_basebackup actually change the primary's data directory? I
> don't think so, so that does not seem to be a problem.
>
> pg_rewind and pg_resetwal are (TTBOMK) pretty quick operations, while
> pg_checksums can potentially run for hours, so I see the point of taking
> extra care here.
>
> On the other hand, two pg_checksums running in parallel also seem not
> much of a problem as the cluster is offline anyway.
>
> What is much more of a footgun is one DBA starting pg_checksums --enable
> on a 1TB cluster, then going for lunch, and then the other DBA wondering
> why the DB is down and starting the instance again.
>
> We read the control file on pg_checksums' startup, so once pg_checksums
> finishs it'll write the old checkpoint LSN into pg_control (along with
> the updated checksum version). This is pilot error, but I think we
> should try to guard against it.
>
> I propose we re-read the control file for the enable case after we
> finished operating on all files and (i) check the instance is still
> offline and (ii) update the checksums version from there. That should be
> a small but worthwhile change that could be done anyway.
>

In (i) you need to also check that is' not offline *again*. Somebody could
start *and* stop the database while pg_checksums is running. But that
should hopefully be enough to check the time field?


Another option would be to add a new feature which reliably blocks an
> instance from starting up due to maintenance - either a new control file
> field, some message in postmaster.pid (like "pg_checksums maintenance in
> progress") that would prevent pg_ctl or postgres/postmaster from
> starting up like 'FATAL:  bogus data in lock file "postmaster.pid":
> "pg_checksums in progress' or some other trigger file.
>

Instead of overloading yet another thing on postmaster.pid, it might be
better to just have a separate file that if it exists, blocks startup with
a message defined as the content of that file?


> It
> > could be possible to reach a state where the control file has
> > checksums enabled and some blocks are not correctly synced, still you
> > would notice rather quickly if the server is in an incorrect state at
> > the follow-up startup.
>
> Would you?  I think I'm with Fabien on this one and it seems worthwhile
> to run fsync_pgdata() before and after update_controlfile() - the second
> one should be really quick anyway.
>
> Also, I suggest to maybe add a notice in verbose mode that we are
> syncing the data directory - otherwise the user might wonder what's
> going on at 100% done, though I haven't seen a large delay in my tests
> so far.
>

Seems like a good idea -- there certainly could be a substantial delay
there depending on data size and underlying storage.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Fabien COELHO


Hello,


Yep. That is the issue I think is preventable by fsyncing updated data
*then* writing & syncing the control file, and that should be done by
pg_checksums.


Well, pg_rewind works similarly: control file gets updated and then
the whole data directory gets flushed.


So it is basically prone to the same potential issue?

In my opinion, the take here is that we log something after the sync of 
the whole data folder is done, so as in the event of a crash an operator 
can make sure that everything has happened.


I do not understand. I'm basically only suggesting to reorder 3 lines and 
add an fsync so that this potential problem goes away, see attached poc 
(which does not compile because pg_fsync is in the backend only, however 
it works with fsync but on linux, I'm unsure of the portability, 
probably pg_fsync should be moved to port or something).


--
Fabien.diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index a7c39ac99a..1d7dd52ad0 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -454,17 +454,16 @@ main(int argc, char *argv[])
 		}
 	}
 
-	/*
-	 * Finally update the control file, flushing the data directory at the
-	 * end.
-	 */
+	/* Flush the data directory and update the control file */
 	if (mode == PG_MODE_ENABLE || mode == PG_MODE_DISABLE)
 	{
+		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+
 		/* Update control file */
 		ControlFile->data_checksum_version =
 			(mode == PG_MODE_ENABLE) ? PG_DATA_CHECKSUM_VERSION : 0;
 		update_controlfile(DataDir, progname, ControlFile);
-		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+
 		if (verbose)
 			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
 		if (mode == PG_MODE_ENABLE)
diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index 71e67a2eda..0f599826e0 100644
--- a/src/common/controldata_utils.c
+++ b/src/common/controldata_utils.c
@@ -145,8 +145,8 @@ get_controlfile(const char *DataDir, const char *progname, bool *crc_ok_p)
  *
  * Update controlfile values with the contents given by caller.  The
  * contents to write are included in "ControlFile".  Note that it is up
- * to the caller to fsync the updated file, and to properly lock
- * ControlFileLock when calling this routine in the backend.
+ * to the caller to properly lock ControlFileLock when calling this
+ * routine in the backend.
  */
 void
 update_controlfile(const char *DataDir, const char *progname,
@@ -216,6 +216,20 @@ update_controlfile(const char *DataDir, const char *progname,
 #endif
 	}
 
+	if (pg_fsync(fd) != 0)
+	{
+#ifndef FRONTEND
+		ereport(PANIC,
+(errcode_for_file_access(),
+ errmsg("could not fsync file \"%s\": %m",
+		ControlFilePath)));
+#else
+		fprintf(stderr, _("%s: could not fsync \"%s\": %s\n"),
+progname, ControlFilePath, strerror(errno));
+		exit(EXIT_FAILURE);
+#endif
+	}
+
 #ifndef FRONTEND
 	if (CloseTransientFile(fd))
 		ereport(PANIC,


Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Banck
Am Mittwoch, den 13.03.2019, 18:31 +0900 schrieb Michael Paquier:
> On Wed, Mar 13, 2019 at 10:08:33AM +0100, Fabien COELHO wrote:
> > I'm not sure of the punctuation logic on the help line: the first sentence
> > does not end with a ".". I could not find an instance of this style in other
> > help on pg commands. I'd suggest "check data checksums (default)" would work
> > around and be more in line with other commands help.
> 
> Good idea, let's do that.
> 
> > I slowed down pg_checksums by adding a 0.1s sleep when scanning a new file,
> > then started a "pg_checksums --enable" on a stopped cluster, then started
> > the cluster while the enabling was in progress, then connected and updated
> > data.
> 
> Well, yes, don't do that.  You can get into the same class of problems
> while running pg_rewind, pg_basebackup or even pg_resetwal once the
> initial control file check is done for each one of these tools.
> 
> > I do not think it is a good thing that two commands can write to the data
> > directory at the same time, really.
> 
> We don't prevent either a pg_resetwal and a pg_basebackup to run in
> parallel.  That would be...  Interesting.

But does pg_basebackup actually change the primary's data directory? I
don't think so, so that does not seem to be a problem.

pg_rewind and pg_resetwal are (TTBOMK) pretty quick operations, while
pg_checksums can potentially run for hours, so I see the point of taking
extra care here.

On the other hand, two pg_checksums running in parallel also seem not
much of a problem as the cluster is offline anyway.

What is much more of a footgun is one DBA starting pg_checksums --enable 
on a 1TB cluster, then going for lunch, and then the other DBA wondering
why the DB is down and starting the instance again.

We read the control file on pg_checksums' startup, so once pg_checksums
finishs it'll write the old checkpoint LSN into pg_control (along with
the updated checksum version). This is pilot error, but I think we
should try to guard against it.

I propose we re-read the control file for the enable case after we
finished operating on all files and (i) check the instance is still
offline and (ii) update the checksums version from there. That should be
a small but worthwhile change that could be done anyway.

Another option would be to add a new feature which reliably blocks an
instance from starting up due to maintenance - either a new control file
field, some message in postmaster.pid (like "pg_checksums maintenance in
progress") that would prevent pg_ctl or postgres/postmaster from
starting up like 'FATAL:  bogus data in lock file "postmaster.pid":
"pg_checksums in progress' or some other trigger file.

> > About fsync-ing: ISTM that it is possible that the control file is written
> > to disk while data are still not written, so a failure in between would
> > leave the cluster with an inconsistent state. I think that it should fsync
> > the data *then* update the control file and fsync again on that one.
> 
> if --enable is used, we fsync the whole data directory after writing
> all the blocks and updating the control file at the end.  The case you
> are referring to here is in fsync_pgdata(), not pg_checksums actually,
> because you could reach the same state after a simple initdb.  

But in the initdb case you don't have any valuable data in the instance
yet.

> It
> could be possible to reach a state where the control file has
> checksums enabled and some blocks are not correctly synced, still you
> would notice rather quickly if the server is in an incorrect state at
> the follow-up startup.

Would you?  I think I'm with Fabien on this one and it seems worthwhile
to run fsync_pgdata() before and after update_controlfile() - the second
one should be really quick anyway. 

Also, I suggest to maybe add a notice in verbose mode that we are
syncing the data directory - otherwise the user might wonder what's
going on at 100% done, though I haven't seen a large delay in my tests
so far.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Paquier
On Wed, Mar 13, 2019 at 10:44:03AM +0100, Fabien COELHO wrote:
> Yep. That is the issue I think is preventable by fsyncing updated data
> *then* writing & syncing the control file, and that should be done by
> pg_checksums.

Well, pg_rewind works similarly: control file gets updated and then
the whole data directory gets flushed.  In my opinion, the take here
is that we log something after the sync of the whole data folder is
done, so as in the event of a crash an operator can make sure that
everything has happened.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Fabien COELHO




I do not think it is a good thing that two commands can write to the data
directory at the same time, really.


We don't prevent either a pg_resetwal and a pg_basebackup to run in
parallel.  That would be...  Interesting.


Yep, I'm trying again to suggest that this kind of thing should be 
prevented. It seems that I'm pretty unconvincing.



About fsync-ing: ISTM that it is possible that the control file is written
to disk while data are still not written, so a failure in between would
leave the cluster with an inconsistent state. I think that it should fsync
the data *then* update the control file and fsync again on that one.


if --enable is used, we fsync the whole data directory after writing
all the blocks and updating the control file at the end. [...]
It could be possible to reach a state where the control file has 
checksums enabled and some blocks are not correctly synced, still you 
would notice rather quickly if the server is in an incorrect state at 
the follow-up startup.


Yep. That is the issue I think is preventable by fsyncing updated data 
*then* writing & syncing the control file, and that should be done by

pg_checksums.

--
Fabien.



Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Paquier
On Wed, Mar 13, 2019 at 10:08:33AM +0100, Fabien COELHO wrote:
> I'm not sure of the punctuation logic on the help line: the first sentence
> does not end with a ".". I could not find an instance of this style in other
> help on pg commands. I'd suggest "check data checksums (default)" would work
> around and be more in line with other commands help.

Good idea, let's do that.

> I slowed down pg_checksums by adding a 0.1s sleep when scanning a new file,
> then started a "pg_checksums --enable" on a stopped cluster, then started
> the cluster while the enabling was in progress, then connected and updated
> data.

Well, yes, don't do that.  You can get into the same class of problems
while running pg_rewind, pg_basebackup or even pg_resetwal once the
initial control file check is done for each one of these tools.

> I do not think it is a good thing that two commands can write to the data
> directory at the same time, really.

We don't prevent either a pg_resetwal and a pg_basebackup to run in
parallel.  That would be...  Interesting.

> About fsync-ing: ISTM that it is possible that the control file is written
> to disk while data are still not written, so a failure in between would
> leave the cluster with an inconsistent state. I think that it should fsync
> the data *then* update the control file and fsync again on that one.

If --disable is used, the control file gets updated at the end without
doing anything else.  If the host crashes, it could be possible that
the control file has checksums enabled or disabled.  If the state is
disabled, then well it succeeded.  If the state is enabled, then the
control file is still correct, because all the other blocks still have
checksums set.

if --enable is used, we fsync the whole data directory after writing
all the blocks and updating the control file at the end.  The case you
are referring to here is in fsync_pgdata(), not pg_checksums actually,
because you could reach the same state after a simple initdb.  It
could be possible to reach a state where the control file has
checksums enabled and some blocks are not correctly synced, still you
would notice rather quickly if the server is in an incorrect state at
the follow-up startup.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Fabien COELHO


Michaël-san,


Now the set of patches is:
- 0001, add --enable and --disable.  I have tweaked a bit the patch so
as "action" is replaced by "mode" which is more consistent with other
tools like pg_ctl.  pg_indent was also complaining about one of the
new enum structures.


Patch applies cleanly, compiles, various make check ok, doc build ok.

I'm still at odds with the exit(1) behavior when there is nothing to do.

If this behavior is kept, I think that the documentation needs to be 
improved because "failed" does not describe a no-op-was-needed to me.


"""
If enabling or disabling checksums, the exit status is nonzero if the 
operation failed.

"""

Maybe: "... if the operation failed or the requested setting is already 
active." would at least describe clearly the implemented behavior.


 +   printf(_("  -c, --checkcheck data checksums\n"));
 +   printf(_(" This is the default mode if nothing is 
specified.\n"));

I'm not sure of the punctuation logic on the help line: the first sentence 
does not end with a ".". I could not find an instance of this style in 
other help on pg commands. I'd suggest "check data checksums (default)" 
would work around and be more in line with other commands help.


I see a significant locking issue, which I discussed on other threads 
without convincing anyone. I could do the following things:


I slowed down pg_checksums by adding a 0.1s sleep when scanning a new 
file, then started a "pg_checksums --enable" on a stopped cluster, then 
started the cluster while the enabling was in progress, then connected and 
updated data. Hmmm. Then I stopped while the slow enabling was still in 
progress. Then I could also run a fast pg_checksums --enable in parallel, 
overtaking the first one... then when the fast one finished, I started the 
cluster again. When the slow one finished, it overwrote the control file, 
I had a running cluster with a control file which did not say so, so I 
could disable the checksum. Hmmm again. Ok, I could not generate a 
inconsistent state because on stopping the cluster the cluster file is 
overwritten with the initial state from the point of view of postmater, 
but it does not look good.


I do not think it is a good thing that two commands can write to the data 
directory at the same time, really.


About fsync-ing: ISTM that it is possible that the control file is written 
to disk while data are still not written, so a failure in between would 
leave the cluster with an inconsistent state. I think that it should fsync 
the data *then* update the control file and fsync again on that one.



- 0002, add --no-sync.


Patch applies cleanly, compiles, various make checks are ok, doc build ok.

Fine with me.

--
Fabien.

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Paquier
On Wed, Mar 13, 2019 at 07:18:32AM +0100, Fabien COELHO wrote:
> I probably can do that before next Monday. I'll prioritize reviewing the
> latest instance of this patch, though.

Thanks.  The core code of the feature has not really changed with the
last reviews, except for the tweaks in the variable names and I think
that it's in a rather committable shape.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Fabien COELHO


Bonjour Michaël,


Yes, that would be nice, for now I have focused.  For pg_resetwal yes
we could do it easily.  Would you like to send a patch?


I probably can do that before next Monday. I'll prioritize reviewing the 
latest instance of this patch, though.



This seem contradictory to me: you want to disable checksum, and they are
already disabled, so nothing is needed. How does that qualifies as a
"failed" operation?


If the operation is automated, then a proper reaction can be done if
multiple attempts are done.  Of course, I am fine to tune things one
way or the other depending on the opinion of the crowd here.  From the
opinions gathered, I can see that (Michael * 2) prefer failing with
exit(1), while (Fabien * 1) would like to just do exit(0).


Yep, that sums it up:-).


Indeed. I do not immediately see the use case where no syncing would be a
good idea. I can see why it would be a bad idea. So I'm not sure of the
concept.


To leverage the buildfarm effort I think this one is worth it.  Or we
finish to fsync the data folder a couple of times, which would make
the small-ish buildfarm machines suffer more than they need.


Ok for the particular use-case, provided that the documentation is very 
clear about the risks, which is the case, so fine with me wrt to the 
feature.


--
Fabien.

Re: Offline enabling/disabling of data checksums

2019-03-12 Thread Michael Paquier
On Tue, Mar 12, 2019 at 09:44:03PM +0900, Michael Paquier wrote:
> Yes, it does not matter much in practice, but other tools just don't
> do that.  Note that changing it can be actually annoying for a
> backpatch if we don't have the --enable/--disable part, because git is
> actually smart enough to detect the file renaming across branches as
> far as I tried, but as we are refactoring this code anyway for
> --enable and --disable let's just do it, that's cleaner.

Okay, please find attached a rebased patch set.  I have committed 0001
which adds version checks for the control file, and the renaming
part 0002.  What remains now is the addition of the new options, and
--no-sync.  The "return 1" stuff has been switched to exit(1) while on
it, and is part of 0003.

Now the set of patches is:
- 0001, add --enable and --disable.  I have tweaked a bit the patch so
as "action" is replaced by "mode" which is more consistent with other
tools like pg_ctl.  pg_indent was also complaining about one of the
new enum structures.
- 0002, add --no-sync.

Thanks,
--
Michael
From e8118b7063a1a615dfc24f376ab3998cda67330a Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 13 Mar 2019 11:12:53 +0900
Subject: [PATCH v3 1/2] Add options to enable and disable checksums in
 pg_checksums

An offline cluster can now work with more modes in pg_checksums:
- --enable can enable checksums in a cluster, updating all blocks with a
correct checksum, and update the control file at the end.
- --disable can disable checksums in a cluster, updating the the control
file.
- --check is an extra option able to verify checksums for a cluster.

When running --enable or --disable, the data folder gets fsync'd for
durability.  If no mode is specified in the options, then --check is
used for compatibility with older versions of pg_verify_checksums (now
renamed to pg_checksums in v12).

Author: Michael Banck
Reviewed-by: Fabien Coelho, Michael Paquier
Discussion: https://postgr.es/m/20181221201616.gd4...@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/pg_checksums.sgml|  50 +++-
 src/bin/pg_checksums/pg_checksums.c   | 171 ++
 src/bin/pg_checksums/t/002_actions.pl |  76 +---
 src/tools/pgindent/typedefs.list  |   1 +
 4 files changed, 252 insertions(+), 46 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 6a47dda683..776f7be477 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -16,7 +16,7 @@ PostgreSQL documentation
 
  
   pg_checksums
-  verify data checksums in a PostgreSQL database cluster
+  enable, disable or check data checksums in a PostgreSQL database cluster
  
 
  
@@ -36,10 +36,19 @@ PostgreSQL documentation
  
   Description
   
-   pg_checksums verifies data checksums in a
-   PostgreSQL cluster.  The server must be shut
-   down cleanly before running pg_checksums.
-   The exit status is zero if there are no checksum errors, otherwise nonzero.
+   pg_checksums checks, enables or disables data
+   checksums in a PostgreSQL cluster.  The server
+   must be shut down cleanly before running
+   pg_checksums. The exit status is zero if there
+   are no checksum errors when checking them, and nonzero if at least one
+   checksum failure is detected. If enabling or disabling checksums, the
+   exit status is nonzero if the operation failed.
+  
+
+  
+   While checking or enabling checksums needs to scan or write every file in
+   the cluster, disabling will only update the file
+   pg_control.
   
  
 
@@ -60,6 +69,37 @@ PostgreSQL documentation
   
  
 
+ 
+  -c
+  --check
+  
+   
+Checks checksums. This is the default mode if nothing else is
+specified.
+   
+  
+ 
+
+ 
+  -d
+  --disable
+  
+   
+Disables checksums.
+   
+  
+ 
+
+ 
+  -e
+  --enable
+  
+   
+Enables checksums.
+   
+  
+ 
+
  
   -v
   --verbose
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 6571c34211..a7c39ac99a 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -1,7 +1,8 @@
 /*-
  *
  * pg_checksums.c
- *	  Verifies page level checksums in an offline cluster.
+ *	  Checks, enables or disables page level checksums for an offline
+ *	  cluster
  *
  * Copyright (c) 2010-2019, PostgreSQL Global Development Group
  *
@@ -17,14 +18,15 @@
 #include 
 #include 
 
-#include "catalog/pg_control.h"
+#include "access/xlog_internal.h"
 #include "common/controldata_utils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "getopt_long.h"
 #include "pg_getopt.h"
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/checksum_impl.h"
-#include "storage/fd.h"
 
 
 static int64 files 

Re: Offline enabling/disabling of data checksums

2019-03-12 Thread Michael Paquier
On Tue, Mar 12, 2019 at 10:08:19PM +0100, Fabien COELHO wrote:
> This refactoring patch is ok for me: applies, compiles, check is ok.
> However, Am I right in thinking that the change should propagate to other
> tools which manipulate the control file, eg pg_resetwal, postmaster… So that
> there would be only one shared API to update the control file?

Yes, that would be nice, for now I have focused.  For pg_resetwal yes
we could do it easily.  Would you like to send a patch?

> I'm wondering whether there should be something done so that the
> inter-release documentation navigation works? Should the section keep the
> former name? Is it managed by hand somewhere else? Maybe it would require to
> keep the refsect1 id, or to duplicate it, not sure.

When it came to the renaming of pg_receivexlog to pg_receivewal, we
did not actually do anything in the core code, and let the magic
happen on pgsql-www.  I have also pinged pgsql-packagers about the
renaming and it is not really an issue on their side.  So I have
committed the renaming to pg_checksums as well.  So now remains only
the new options.

> In "doc/src/sgml/ref/allfiles.sgml" there seems to be a habit to align on
> the SYSTEM keyword, which is not fellowed by the patch.

Sure.  I sent an updated patch to actually fix that, and also address
a couple of other side things I noticed on the way like the top
refentry in the docs or the header format at the top of
pg_checksums.c as we are on tweaking the area.

> This seem contradictory to me: you want to disable checksum, and they are
> already disabled, so nothing is needed. How does that qualifies as a
> "failed" operation?

If the operation is automated, then a proper reaction can be done if
multiple attempts are done.  Of course, I am fine to tune things one
way or the other depending on the opinion of the crowd here.  From the
opinions gathered, I can see that (Michael * 2) prefer failing with
exit(1), while (Fabien * 1) would like to just do exit(0).

> Further review will come later.

Thanks, Fabien!

> Indeed. I do not immediately see the use case where no syncing would be a
> good idea. I can see why it would be a bad idea. So I'm not sure of the
> concept.

To leverage the buildfarm effort I think this one is worth it.  Or we
finish to fsync the data folder a couple of times, which would make
the small-ish buildfarm machines suffer more than they need.

I am going to send a rebased patch set of the remaining things at the
top of the discussion as well.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-12 Thread Fabien COELHO


Bonjour Michaël,

Here is a partial review:


- 0001 if a patch to refactor the routine for the control file
update.  I have made it backend-aware, and we ought to be careful with
error handling, use of fds and such, something that v4 was not very
careful about.


This refactoring patch is ok for me: applies, compiles, check is ok.

However, Am I right in thinking that the change should propagate to other 
tools which manipulate the control file, eg pg_resetwal, postmaster… So 
that there would be only one shared API to update the control file?



- 0002 renames pg_verify_checksums to pg_checksums with a
straight-forward switch.  Docs as well as all references to
pg_verify_checksums are updated.


Looks ok to me. Applies, compiles, checks are ok. Doc build is ok.

I'm wondering whether there should be something done so that the 
inter-release documentation navigation works? Should the section keep the 
former name? Is it managed by hand somewhere else? Maybe it would require 
to keep the refsect1 id, or to duplicate it, not sure.


In "doc/src/sgml/ref/allfiles.sgml" there seems to be a habit to align on 
the SYSTEM keyword, which is not fellowed by the patch.



- 0003 adds the new options --check, --enable and --disable, with
--check being the default as discussed.


Looks like the patch I already reviewed, but I'll look at it in details 
later.


"If enabling or disabling checksums, the exit status is nonzero if the 
operation failed."


However:

 +   if (ControlFile->data_checksum_version == 0 &&
 +   action == PG_ACTION_DISABLE)
 +   {
 +   fprintf(stderr, _("%s: data checksums are already disabled in 
cluster.\n"), progname);
 +   exit(1);
 +   }

This seem contradictory to me: you want to disable checksum, and they are 
already disabled, so nothing is needed. How does that qualifies as a 
"failed" operation?


Further review will come later.

- 0004 adds a -N/--no-sync which I think is nice for consistency with 
other tools.  That's also useful for the tests, and was not discussed 
until now on this thread.


Indeed. I do not immediately see the use case where no syncing would be a 
good idea. I can see why it would be a bad idea. So I'm not sure of the 
concept.


--
Fabien.

Re: Offline enabling/disabling of data checksums

2019-03-12 Thread Michael Paquier
On Tue, Mar 12, 2019 at 11:13:46AM +0100, Michael Banck wrote:
> I have a feeling it is project policy to return 0 from main(), and
> exit(1) if a program aborts with an error.

Yes, it does not matter much in practice, but other tools just don't
do that.  Note that changing it can be actually annoying for a
backpatch if we don't have the --enable/--disable part, because git is
actually smart enough to detect the file renaming across branches as
far as I tried, but as we are refactoring this code anyway for
--enable and --disable let's just do it, that's cleaner.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-12 Thread Sergei Kornilov
Hello

Thank you for explain. I thought so.

PS: I am not sure for now about patch status in CF app. Did not changed status

regards, Sergei



Re: Offline enabling/disabling of data checksums

2019-03-12 Thread Michael Banck
Hi,


Am Montag, den 11.03.2019, 14:11 + schrieb Sergei Kornilov:
> > if (badblocks > 0)
> > return 1;
> 
> Small question: why return 1 instead of exit(1)?

I have a feeling it is project policy to return 0 from main(), and
exit(1) if a program aborts with an error.

In the above case, the program finishes more-or-less as intended (no
abort), but due to errors found on the way, does not return with 0.

I don't mind either way and probably exit(1) makes more sense, but I
wanted to explain why it is like that.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



  1   2   >