subject:"Re\: \[HACKERS\] pg_upgrade and rsync"

On Wed, Jan 28, 2015 at 09:26:11PM -0800, Josh Berkus wrote:
 3. Check that the replica is not very lagged.  If it is, wait for
 traffic to die down and for it to catch up.

Is this necessary.  It seems quite imprecise too.

 4. Shut down the master using -m fast or -m smart for a clean shutdown.
  It is not necessary to shut down the replicas yet.

We already give instructions on how to shut down the server in the
pg_ugprade docs.

 5. pg_upgrade the master using the --link option.  Do not start the new
 version yet.

Stephen mentioned that --link is not clear in the old docs --- I fixed
that.

 6. create a data directory for the new version on the replica.  This
 directory should be empty; if it was initdb'd by the installation
 package, then delete its contents.

rsync will create this for you.

 10. Start the master, then the replica

I have incorporated all your suggestions in the attached patch.  I also
split items into separate sections as you suggested.  You can read the
end result here:

http://momjian.us/tmp/pgsql/pgupgrade.html

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
new file mode 100644
index 07ca0dc..e25e0d0
*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
*** tar -cf backup.tar /usr/local/pgsql/data
*** 438,445 
 Another option is to use applicationrsync/ to perform a file
 system backup.  This is done by first running applicationrsync/
 while the database server is running, then shutting down the database
!server just long enough to do a second applicationrsync/.  The
!second applicationrsync/ will be much quicker than the first,
 because it has relatively little data to transfer, and the end result
 will be consistent because the server was down.  This method
 allows a file system backup to be performed with minimal downtime.
--- 438,447 
 Another option is to use applicationrsync/ to perform a file
 system backup.  This is done by first running applicationrsync/
 while the database server is running, then shutting down the database
!server long enough to do an commandrsync --checksum/.
!(option--checksum/ is necessary because commandrsync/ only
!has file modification-time granularity of one second.)  The
!second applicationrsync/ will be quicker than the first,
 because it has relatively little data to transfer, and the end result
 will be consistent because the server was down.  This method
 allows a file system backup to be performed with minimal downtime.
diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml
new file mode 100644
index e1cd260..a97a393
*** a/doc/src/sgml/pgupgrade.sgml
--- b/doc/src/sgml/pgupgrade.sgml
*** NET STOP postgresql-8.4
*** 315,320 
--- 315,324 
  NET STOP postgresql-9.0
  /programlisting
  /para
+ 
+ para
+  Log-shipping standby servers can remain running until a later step.
+ /para
 /step
  
 step
*** pg_upgrade.exe
*** 399,404 
--- 403,513 
 /step
  
 step
+ titleUpgrade any Log-Shipping Standby Servers/title
+ 
+ para
+  If you have Log-Shipping Standby Servers (xref
+  linkend=warm-standby), follow these steps to upgrade them (before
+  starting any servers):
+ /para
+ 
+ procedure
+ 
+  step
+   titleInstall the new PostgreSQL binaries on standby servers/title
+ 
+   para
+Make sure the new binaries and support files are installed on all
+standby servers.
+   /para
+  /step
+ 
+  step
+   titleMake sure the new standby data directories do emphasisnot/
+   exist/title
+ 
+   para
+Make sure the new standby data directories do emphasisnot/
+exist or are empty.  If applicationinitdb/ was run, delete
+the standby server data directories.
+   /para
+  /step
+ 
+  step
+   titleInstall custom shared object files/title
+ 
+   para
+Install the same custom shared object files on the new standbys
+that you installed in the new master cluster.
+   /para
+  /step
+ 
+  step
+   titleStop standby servers/title
+ 
+   para
+If the standby servers are still running, stop them now using the
+above instructions.
+   /para
+  /step
+ 
+  step
+   titleSave configuration files/title
+ 
+   para
+Save any configuration files from the standbys you need to keep,
+e.g.  filenamepostgresql.conf/, literalrecovery.conf/,
+as these will be overwritten or removed in the next step.
+   /para
+  /step
+ 
+  step
+   titleRun applicationrsync//title
+ 
+   para
+From a directory that is above the old and new database cluster
+directories, run this

Re: [HACKERS] pg_upgrade and rsync

On Thu, Jan 29, 2015 at 10:21:30AM -0500, Andrew Dunstan wrote:
 
 On 01/29/2015 12:26 AM, Josh Berkus wrote:
 So, for my 2c, I'm on the fence about it.  On the one hand, I agree,
 it's a bit of a complex process to get right.  On the other hand, it's
 far better if we put something out there along the lines of if you
 really want to, this is how to do it than having folks try to fumble
 through to find the correct steps themselves.
 So, here's the correct steps for Bruce, because his current doc does not
 cover all of these.  I really think this should go in as a numbered set
 of steps; the current doc has some steps as steps, and other stuff
 buried in paragraphs.
 
 1. Install the new version binaries on both servers, alongside the old
 version.
 
 2. If not done by the package install, initdb the new version's data
 directory.
 
 3. Check that the replica is not very lagged.  If it is, wait for
 traffic to die down and for it to catch up.
 
 4. Shut down the master using -m fast or -m smart for a clean shutdown.
   It is not necessary to shut down the replicas yet.
 
 5. pg_upgrade the master using the --link option.  Do not start the new
 version yet.
 
 6. create a data directory for the new version on the replica.  This
 directory should be empty; if it was initdb'd by the installation
 package, then delete its contents.
 
 7. shut down postgres on the replica.
 
 8. rsync both the old and new data directories from the master to the
 replica, using the --size-only and -H hard links options.  For example,
 if both 9.3 and 9.4 are in /var/lib/postgresql, do:
 
 rsync -aHv --size-only -e ssh --itemize-changes /var/lib/postgresql/
 replica-host:/var/lib/postgresql/
 
 9. Create a recovery.conf file in the replica's data directory with the
 appropriate parameters.
 
 10. Start the master, then the replica
 
 
 
 I find steps 2 and 6 confusing.

For number 2, he is creating a new cluster on the master server.  For
#6, he is just creating an empty data directory, though this is not
required as rsync will create the directory for you.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

On Tue, Jan 27, 2015 at 10:16:48PM -0500, David Steele wrote:
 This is definitely an edge case.  Not only does the file have to be
 modified in the same second *after* rsync has done the copy, but the
 file also has to not be modified in *any other subsequent second* before
 the next incremental backup.  If the file is busy enough to have a
 collision with rsync in that second, then it is very likely to be
 modified before the next incremental backup which is generally a day or
 so later.  And, of course, the backup where the issue occurs is fine -
 it's the next backup that is invalid.
 
 However, the hot/cold backup scheme as documented does make the race
 condition more likely since the two backups are done in close proximity
 temporally.  Ultimately, the most reliable method is to use checksums.
 
 For me the biggest issue is that there is no way to discover if a db in
 consistent no matter how much time/resources you are willing to spend. 
 I could live with the idea of the occasional bad backup (since I keep as
 many as possible), but having no way to know whether it is good or not
 is very frustrating.  I know data checksums are a step in that
 direction, but they are a long way from providing the optimal solution. 
 I've implemented rigorous checksums in PgBackRest but something closer
 to the source would be even better.

Agreed.  I have update the two mentions of rsync in our docs to clarify
this.  Thank you.

The patch also has pg_upgrade doc improvements suggested by comments
from Josh Berkus.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
new file mode 100644
index 07ca0dc..e25e0d0
*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
*** tar -cf backup.tar /usr/local/pgsql/data
*** 438,445 
 Another option is to use applicationrsync/ to perform a file
 system backup.  This is done by first running applicationrsync/
 while the database server is running, then shutting down the database
!server just long enough to do a second applicationrsync/.  The
!second applicationrsync/ will be much quicker than the first,
 because it has relatively little data to transfer, and the end result
 will be consistent because the server was down.  This method
 allows a file system backup to be performed with minimal downtime.
--- 438,447 
 Another option is to use applicationrsync/ to perform a file
 system backup.  This is done by first running applicationrsync/
 while the database server is running, then shutting down the database
!server long enough to do an commandrsync --checksum/.
!(option--checksum/ is necessary because commandrsync/ only
!has file modification-time granularity of one second.)  The
!second applicationrsync/ will be quicker than the first,
 because it has relatively little data to transfer, and the end result
 will be consistent because the server was down.  This method
 allows a file system backup to be performed with minimal downtime.
diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml
new file mode 100644
index e1cd260..ed65def
*** a/doc/src/sgml/pgupgrade.sgml
--- b/doc/src/sgml/pgupgrade.sgml
*** pg_upgrade.exe
*** 409,414 
--- 409,504 
 /step
  
 step
+ titleUpgrade any Log-Shipping Standby Servers/title
+ 
+ para
+  If you have Log-Shipping Standby Servers (xref
+  linkend=warm-standby), follow these steps to upgrade them (before
+  starting any servers):
+ /para
+ 
+ procedure
+ 
+  step
+   titleInstall the new PostgreSQL binaries on standby servers/title
+ 
+   para
+Make sure the new binaries and support files are installed
+on all the standby servers.  Do emphasisnot/ run
+applicationinitdb/.  If applicationinitdb/ was run, delete
+the standby server data directories.  Also, install any custom
+shared object files on the new standbys that you installed in the
+new master cluster.
+   /para
+  /step
+ 
+  step
+   titleShutdown the Standby Servers/title
+ 
+   para
+If the standby servers are still running, shut them down.  Save any
+configuration files from the standbys you need to keep, e.g.
+filenamepostgresql.conf/, literalrecovery.conf/, as these
+will be overwritten or removed in the next step.
+   /para
+  /step
+ 
+  step
+   titleRun applicationrsync//title
+ 
+   para
+From a directory that is above the old and new database cluster
+directories, run this for each slave:
+ 
+ programlisting
+rsync --archive --hard-links --size-only old_pgdata new_pgdata remote_dir
+ /programlisting
+ 
+where optionold_pgdata/ and optionnew_pgdata/ are relative
+to

Re: [HACKERS] pg_upgrade and rsync

2015-01-29 Thread Andrew Dunstan



On 01/29/2015 12:26 AM, Josh Berkus wrote:

So, for my 2c, I'm on the fence about it.  On the one hand, I agree,
it's a bit of a complex process to get right.  On the other hand, it's
far better if we put something out there along the lines of if you
really want to, this is how to do it than having folks try to fumble
through to find the correct steps themselves.

So, here's the correct steps for Bruce, because his current doc does not
cover all of these.  I really think this should go in as a numbered set
of steps; the current doc has some steps as steps, and other stuff
buried in paragraphs.

1. Install the new version binaries on both servers, alongside the old
version.

2. If not done by the package install, initdb the new version's data
directory.

3. Check that the replica is not very lagged.  If it is, wait for
traffic to die down and for it to catch up.

4. Shut down the master using -m fast or -m smart for a clean shutdown.
  It is not necessary to shut down the replicas yet.

5. pg_upgrade the master using the --link option.  Do not start the new
version yet.

6. create a data directory for the new version on the replica.  This
directory should be empty; if it was initdb'd by the installation
package, then delete its contents.

7. shut down postgres on the replica.

8. rsync both the old and new data directories from the master to the
replica, using the --size-only and -H hard links options.  For example,
if both 9.3 and 9.4 are in /var/lib/postgresql, do:

rsync -aHv --size-only -e ssh --itemize-changes /var/lib/postgresql/
replica-host:/var/lib/postgresql/

9. Create a recovery.conf file in the replica's data directory with the
appropriate parameters.

10. Start the master, then the replica




I find steps 2 and 6 confusing.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-01-29 Thread Andrew Dunstan



On 01/29/2015 11:34 AM, Bruce Momjian wrote:

On Thu, Jan 29, 2015 at 10:21:30AM -0500, Andrew Dunstan wrote:

On 01/29/2015 12:26 AM, Josh Berkus wrote:

So, for my 2c, I'm on the fence about it.  On the one hand, I agree,
it's a bit of a complex process to get right.  On the other hand, it's
far better if we put something out there along the lines of if you
really want to, this is how to do it than having folks try to fumble
through to find the correct steps themselves.

So, here's the correct steps for Bruce, because his current doc does not
cover all of these.  I really think this should go in as a numbered set
of steps; the current doc has some steps as steps, and other stuff
buried in paragraphs.

1. Install the new version binaries on both servers, alongside the old
version.

2. If not done by the package install, initdb the new version's data
directory.

3. Check that the replica is not very lagged.  If it is, wait for
traffic to die down and for it to catch up.

4. Shut down the master using -m fast or -m smart for a clean shutdown.
  It is not necessary to shut down the replicas yet.

5. pg_upgrade the master using the --link option.  Do not start the new
version yet.

6. create a data directory for the new version on the replica.  This
directory should be empty; if it was initdb'd by the installation
package, then delete its contents.

7. shut down postgres on the replica.

8. rsync both the old and new data directories from the master to the
replica, using the --size-only and -H hard links options.  For example,
if both 9.3 and 9.4 are in /var/lib/postgresql, do:

rsync -aHv --size-only -e ssh --itemize-changes /var/lib/postgresql/
replica-host:/var/lib/postgresql/

9. Create a recovery.conf file in the replica's data directory with the
appropriate parameters.

10. Start the master, then the replica



I find steps 2 and 6 confusing.

For number 2, he is creating a new cluster on the master server.  For
#6, he is just creating an empty data directory, though this is not
required as rsync will create the directory for you.




Then step 2 should specify that it's for the master.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

On Thu, Jan 29, 2015 at 12:09:58PM -0500, Andrew Dunstan wrote:
 7. shut down postgres on the replica.
 
 8. rsync both the old and new data directories from the master to the
 replica, using the --size-only and -H hard links options.  For example,
 if both 9.3 and 9.4 are in /var/lib/postgresql, do:
 
 rsync -aHv --size-only -e ssh --itemize-changes /var/lib/postgresql/
 replica-host:/var/lib/postgresql/
 
 9. Create a recovery.conf file in the replica's data directory with the
 appropriate parameters.
 
 10. Start the master, then the replica
 
 
 I find steps 2 and 6 confusing.
 For number 2, he is creating a new cluster on the master server.  For
 #6, he is just creating an empty data directory, though this is not
 required as rsync will create the directory for you.
 
 
 
 Then step 2 should specify that it's for the master.

Right.  Josh is just listing all the steps --- the pg_upgrade docs
already have that spelled out in detail.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-01-29 Thread Josh Berkus

On 01/29/2015 09:11 AM, Bruce Momjian wrote:
 On Thu, Jan 29, 2015 at 12:09:58PM -0500, Andrew Dunstan wrote:
 Then step 2 should specify that it's for the master.
 
 Right.  Josh is just listing all the steps --- the pg_upgrade docs
 already have that spelled out in detail.

What I'm also saying is that, if we expect anyone to be able to follow
all of these steps, it has to be very explicit; just saying Follow the
pg_upgrade docs but don't start the master yet isn't clear enough,
because the pg_upgrade docs have a few alternative paths.

On  the whole, I personally would never follow this procedure at a
production site.  It's way too fragile and easy to screw up.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

On 1/29/15 12:42 PM, Josh Berkus wrote:
 On 01/29/2015 09:11 AM, Bruce Momjian wrote:
 On Thu, Jan 29, 2015 at 12:09:58PM -0500, Andrew Dunstan wrote:
 Then step 2 should specify that it's for the master.
 Right.  Josh is just listing all the steps --- the pg_upgrade docs
 already have that spelled out in detail.
 What I'm also saying is that, if we expect anyone to be able to follow
 all of these steps, it has to be very explicit; just saying Follow the
 pg_upgrade docs but don't start the master yet isn't clear enough,
 because the pg_upgrade docs have a few alternative paths.

 On  the whole, I personally would never follow this procedure at a
 production site.  It's way too fragile and easy to screw up.

I'm in agreement with Josh - I would not use this method.  I may be
wrong, but it makes me extremely nervous.

I prefer to upgrade the primary and get it back up as soon as possible,
then take a backup and restore it to the replicas.  If the replicas are
being used for read-only queries instead of just redundancy then I
redirect that traffic to the primary while the replicas are being
upgraded and restored.  This method has the least downtime for the primary.

If you want less downtime overall then it's best to use the hot rsync /
cold rsync with checksums method, though this depends a lot on the size
of your database.

Ultimately, there is no single best method.  It depends a lot on your
environment.  I would prefer the official documents to contain very safe
methods.

-- 
- David Steele
da...@pgmasters.net




signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] pg_upgrade and rsync

2015-01-29 Thread Jim Nasby


On 1/29/15 7:02 PM, David Steele wrote:

On 1/29/15 7:55 PM, Jim Nasby wrote:

On 1/29/15 6:25 PM, David Steele wrote:

Safe backups can be done without LSNs provided you are willing to trust
your timestamps.


Which AFAICT simply isn't safe to do at all... except maybe with the
manifest stuff you've talked about?


Yes - that's what I'm talking about.  I had hoped to speak about this at
PgConfNYC, but perhaps I can do it in a lightning talk instead.


Sounds like maybe it should be part of our documentation too...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-01-29 Thread Jim Nasby


On 1/29/15 5:53 PM, David Steele wrote:

On 1/29/15 12:42 PM, Josh Berkus wrote:

On 01/29/2015 09:11 AM, Bruce Momjian wrote:

On Thu, Jan 29, 2015 at 12:09:58PM -0500, Andrew Dunstan wrote:

Then step 2 should specify that it's for the master.

Right.  Josh is just listing all the steps --- the pg_upgrade docs
already have that spelled out in detail.

What I'm also saying is that, if we expect anyone to be able to follow
all of these steps, it has to be very explicit; just saying Follow the
pg_upgrade docs but don't start the master yet isn't clear enough,
because the pg_upgrade docs have a few alternative paths.

On  the whole, I personally would never follow this procedure at a
production site.  It's way too fragile and easy to screw up.


I'm in agreement with Josh - I would not use this method.  I may be
wrong, but it makes me extremely nervous.

I prefer to upgrade the primary and get it back up as soon as possible,
then take a backup and restore it to the replicas.  If the replicas are
being used for read-only queries instead of just redundancy then I
redirect that traffic to the primary while the replicas are being
upgraded and restored.  This method has the least downtime for the primary.

If you want less downtime overall then it's best to use the hot rsync /
cold rsync with checksums method, though this depends a lot on the size
of your database.

Ultimately, there is no single best method.  It depends a lot on your
environment.  I would prefer the official documents to contain very safe
methods.


How do we define safe though? Your method leaves you without a backup server 
until your base backup completes and the replica catches up. I think we do a 
dis-service to our users by not pointing that out and providing a potential 
alternate *so long as we spell out the tradeoffs/risks*.

Ultimately, I think this thread really shows the very large need for a tool 
that understands things like LSNs to provide rsync-ish behavior that's actually 
safe.

FWIW, I personally am very leery of relying on pg_upgrade. It's too easy to 
introduce bugs, doesn't handle all cases, and provides no option for going back to 
your previous version without losing data. I much prefer old_version -- londiste 
-- new_version, and then doing the upgrade by reversing the direction of 
replication.

I also don't entirely trust PITR backups. It's too easy to accidentally break 
them in subtle ways.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-01-29 Thread Jim Nasby


On 1/29/15 6:25 PM, David Steele wrote:

Safe backups can be done without LSNs provided you are willing to trust
your timestamps.


Which AFAICT simply isn't safe to do at all... except maybe with the manifest 
stuff you've talked about?


FWIW, I personally am very leery of relying on pg_upgrade. It's too
easy to introduce bugs, doesn't handle all cases, and provides no
option for going back to your previous version without losing data. I
much prefer old_version -- londiste -- new_version, and then doing
the upgrade by reversing the direction of replication.

I think the official docs need to stick with options that are core?


I don't think we have any such requirement. IIRC the docs used to talk about 
using logical replication before we had pg_upgrade (and may have actually 
called out Slony).


I avoid pg_upgrade wherever it is practical.  However, sometimes it
really is the best option.


Certainly. I think what we should be doing is spelling out the available 
options (with pros/cons) so that users can decide what's best.


I also don't entirely trust PITR backups. It's too easy to
accidentally break them in subtle ways.

Agreed in general, but I've been doing a lot of work to make this not be
true anymore.


:)

I'd love to see all this stuff Just Work (tm), but I don't think we're there 
yet, and I'm not really sure how we can get there.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

On 1/29/15 11:34 AM, Bruce Momjian wrote:

 3. Check that the replica is not very lagged.  If it is, wait for
 traffic to die down and for it to catch up.

I think I'd want a something a bit more specific here.  When the primary
shuts down it will kick out one last WAL.  The filename should be recorded.

 7. shut down postgres on the replica.

Before the shutdown make sure that the replicas are waiting on the
subsequent log file to appear (note that versions prior to 9.3 skip
00).  That means all WAL has been consumed and the primary and
replica(s) are in the same state.

This is a bit more complex if streaming replication is being used
*without* good old fashioned log shipping to a backup server and I'm not
sure exactly how to go about it.  I suppose you could start Postgres in
single user mode, commit a transaction, and make sure that transaction
gets to the replicas.

OTOH, streaming replication (unless it is synchronous) would be crazy
without doing WAL backup.  Maybe that's just me.

-- - David Steele da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] pg_upgrade and rsync

On 1/29/15 7:55 PM, Jim Nasby wrote:
 On 1/29/15 6:25 PM, David Steele wrote:
 Safe backups can be done without LSNs provided you are willing to trust
 your timestamps.

 Which AFAICT simply isn't safe to do at all... except maybe with the
 manifest stuff you've talked about?

Yes - that's what I'm talking about.  I had hoped to speak about this at
PgConfNYC, but perhaps I can do it in a lightning talk instead.

-- 
- David Steele
da...@pgmasters.net




signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] pg_upgrade and rsync

On 1/29/15 10:13 AM, Bruce Momjian wrote:
 Agreed.  I have update the two mentions of rsync in our docs to clarify
 this.  Thank you.

 The patch also has pg_upgrade doc improvements suggested by comments
 from Josh Berkus.

It's very good to see this.  Mentions of this rsync vulnerability are
few and far between.

-- 
- David Steele
da...@pgmasters.net




signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] pg_upgrade and rsync

On 1/29/15 7:07 PM, Jim Nasby wrote:
 Ultimately, there is no single best method.  It depends a lot on your
 environment.  I would prefer the official documents to contain very safe
 methods.

 How do we define safe though? Your method leaves you without a backup
 server until your base backup completes and the replica catches up. I
 think we do a dis-service to our users by not pointing that out and
 providing a potential alternate *so long as we spell out the
 tradeoffs/risks*.

My method leaves you without a replica, but not without a *backup* as
long as you are shipping WAL somewhere safe.  You can set
archive_timeout to something small if you want to make this safer.  This
is more practical in 9.4 since unused WAL space is zeroed.

OK, I'm willing to admit it would be better to have the option with all
caveats, so long as they are strongly worded.

 Ultimately, I think this thread really shows the very large need for a
 tool that understands things like LSNs to provide rsync-ish behavior
 that's actually safe.

Safe backups can be done without LSNs provided you are willing to trust
your timestamps.

 FWIW, I personally am very leery of relying on pg_upgrade. It's too
 easy to introduce bugs, doesn't handle all cases, and provides no
 option for going back to your previous version without losing data. I
 much prefer old_version -- londiste -- new_version, and then doing
 the upgrade by reversing the direction of replication.

I think the official docs need to stick with options that are core?

I avoid pg_upgrade wherever it is practical.  However, sometimes it
really is the best option.

 I also don't entirely trust PITR backups. It's too easy to
 accidentally break them in subtle ways.

Agreed in general, but I've been doing a lot of work to make this not be
true anymore.

-- 
- David Steele
da...@pgmasters.net




signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] pg_upgrade and rsync

On 1/29/15 8:09 PM, Jim Nasby wrote:
 On 1/29/15 7:02 PM, David Steele wrote:
 On 1/29/15 7:55 PM, Jim Nasby wrote:
 On 1/29/15 6:25 PM, David Steele wrote:
 Safe backups can be done without LSNs provided you are willing to
 trust
 your timestamps.

 Which AFAICT simply isn't safe to do at all... except maybe with the
 manifest stuff you've talked about?

 Yes - that's what I'm talking about.  I had hoped to speak about this at
 PgConfNYC, but perhaps I can do it in a lightning talk instead.

 Sounds like maybe it should be part of our documentation too...

I think the warnings Bruce has added to the documentation about using
checksums are sufficient for now.  The manifest build and delay
methodology are part of PgBackRest, the backup solution I'm working on
as an alternative to barman, etc.  It's not something that can be
implemented trivially.

-- 
- David Steele
da...@pgmasters.net




signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] pg_upgrade and rsync

* Jim Nasby (jim.na...@bluetreble.com) wrote:
 On 1/27/15 9:29 AM, Stephen Frost wrote:
 My point is that Bruce's patch suggests looking for remote_dir in
 the rsync documentation, but no such term appears there.
 Ah, well, perhaps we could simply add a bit of clarification to this:
 
 for details on specifying optionremote_dir/
 
 The whole remote_dir discussion made me think of something... would 
 --link-dest be any help here?

No.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_upgrade and rsync

Bruce,

* Bruce Momjian (br...@momjian.us) wrote:
 On Tue, Jan 27, 2015 at 09:36:58AM -0500, Stephen Frost wrote:
  The example listed works, but only when it's a local rsync:
  
  rsync --archive --hard-links --size-only old_dir new_dir remote_dir
  
  Perhaps a better example (or additional one) would be with a remote
  rsync, including clarification of old and new dir, like so:
  
  (run in /var/lib/postgresql)
  rsync --archive --hard-links --size-only \
9.3/main \
9.4/main \
server:/var/lib/postgresql/
  
  Note that 9.3/main and 9.4/main are two source directories for rsync to
  copy over, while server:/var/lib/postgresql/ is a remote destination
  directory.  The above directories match a default Debian/Ubuntu install.
 
 OK, sorry everyone was confused by 'remote_dir'.  Does this new patch
 help?

Looks better, but --links is not the same as --hard-links.  The example
is right, the but documentation below it mentions option--link/
which is for symlinks, not hard links.

This also should really include a discussion about dealing with
tablespaces, since the example command won't deal with them.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_upgrade and rsync

* Bruce Momjian (br...@momjian.us) wrote:
 Interesting problem, but doesn't rsync use sub-second accuracy?

No.  Simple test will show:

touch xx/aa ; rsync -avv xx yy ; sleep 0.5 ; touch xx/aa ; rsync -avv xx yy

Run that a few times and you'll see it report xx/aa is uptodate
sometimes, depending on when exactly where the sleep falls during the
second.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_upgrade and rsync

Bruce, Stephen, etc.:

So, I did a test trial of this and it seems like it didn't solve the
issue of huge rsyncs.

That is, the only reason to do this whole business via rsync, instead of
doing a new basebackup of each replica, is to cut down on data transfer
time by not resyncing the data from the old base directory.  But in
practice, the majority of the database files seem like they get
transmitted anyway.  Maybe I'm misreading the rsync ouput?

Here's the setup:

3 Ubuntu 14.04 servers on AWS (tiny instance)
Running PostgreSQL 9.3.5
Set up in cascading replication

108 -- 107 -- 109

The goal was to test this with cascading, but I didn't get that far.

I set up a pgbench workload, read-write on the master and read-only on
the two replicas, to simulate a load-balanced workload.  I was *not*
logging hint bits.

I then followed this sequence:

1) Install 9.4 packages on all servers.
2) Shut down the master.
3) pg_upgrade the master using --link
4) shut down replica 107
5) rsync the master's $PGDATA from the replica:

rsync -aHv --size-only -e ssh --itemize-changes
172.31.4.108:/var/lib/postgresql/ /var/lib/postgresql/

... and got:

.d..t.. 9.4/main/pg_xlog/
f+ 9.4/main/pg_xlog/0007000100CB
.d..t.. 9.4/main/pg_xlog/archive_status/

sent 126892 bytes  received 408645000 bytes  7640596.11 bytes/sec
total size is 671135675  speedup is 1.64

So that's 390MB of data transfer.

If I look at the original directory:

postgres@paul: du --max-depth=1 -h
4.0K./.cache
20K ./.ssh
424M./9.3
4.0K./.emacs.d
51M ./9.4
56K ./bench
474M.

So 390MB were transferred out of a possible 474MB.  That certainly seems
like we're still transferring the majority of the data, even though I
verified that the hard links are being sent as hard links.  No?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

On 01/28/2015 02:10 PM, Josh Berkus wrote:
 So 390MB were transferred out of a possible 474MB.  That certainly seems
 like we're still transferring the majority of the data, even though I
 verified that the hard links are being sent as hard links.  No?

Looks like the majority of that was pg_xlog.  Going to tear this down
and start over, and --exclude pg_xlog.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

On 01/28/2015 02:28 PM, Josh Berkus wrote:
 On 01/28/2015 02:10 PM, Josh Berkus wrote:
 So 390MB were transferred out of a possible 474MB.  That certainly seems
 like we're still transferring the majority of the data, even though I
 verified that the hard links are being sent as hard links.  No?
 
 Looks like the majority of that was pg_xlog.  Going to tear this down
 and start over, and --exclude pg_xlog.
 

So, having redone this without the pg_xlog lag, this appears to work in
terms of cutting down the rsync volume.

I'm concerned about putting this in the main docs, though.  This is a
complex, and fragile procedure, which is very easy to get wrong, and
hard to explain for a generic case.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

* Josh Berkus (j...@agliodbs.com) wrote:
 On 01/28/2015 02:28 PM, Josh Berkus wrote:
  On 01/28/2015 02:10 PM, Josh Berkus wrote:
  So 390MB were transferred out of a possible 474MB.  That certainly seems
  like we're still transferring the majority of the data, even though I
  verified that the hard links are being sent as hard links.  No?
  
  Looks like the majority of that was pg_xlog.  Going to tear this down
  and start over, and --exclude pg_xlog.
  
 
 So, having redone this without the pg_xlog lag, this appears to work in
 terms of cutting down the rsync volume.
 
 I'm concerned about putting this in the main docs, though.  This is a
 complex, and fragile procedure, which is very easy to get wrong, and
 hard to explain for a generic case.

So, for my 2c, I'm on the fence about it.  On the one hand, I agree,
it's a bit of a complex process to get right.  On the other hand, it's
far better if we put something out there along the lines of if you
really want to, this is how to do it than having folks try to fumble
through to find the correct steps themselves.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_upgrade and rsync


 So, for my 2c, I'm on the fence about it.  On the one hand, I agree,
 it's a bit of a complex process to get right.  On the other hand, it's
 far better if we put something out there along the lines of if you
 really want to, this is how to do it than having folks try to fumble
 through to find the correct steps themselves.

So, here's the correct steps for Bruce, because his current doc does not
cover all of these.  I really think this should go in as a numbered set
of steps; the current doc has some steps as steps, and other stuff
buried in paragraphs.

1. Install the new version binaries on both servers, alongside the old
version.

2. If not done by the package install, initdb the new version's data
directory.

3. Check that the replica is not very lagged.  If it is, wait for
traffic to die down and for it to catch up.

4. Shut down the master using -m fast or -m smart for a clean shutdown.
 It is not necessary to shut down the replicas yet.

5. pg_upgrade the master using the --link option.  Do not start the new
version yet.

6. create a data directory for the new version on the replica.  This
directory should be empty; if it was initdb'd by the installation
package, then delete its contents.

7. shut down postgres on the replica.

8. rsync both the old and new data directories from the master to the
replica, using the --size-only and -H hard links options.  For example,
if both 9.3 and 9.4 are in /var/lib/postgresql, do:

rsync -aHv --size-only -e ssh --itemize-changes /var/lib/postgresql/
replica-host:/var/lib/postgresql/

9. Create a recovery.conf file in the replica's data directory with the
appropriate parameters.

10. Start the master, then the replica


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

On Sat, Jan 24, 2015 at 10:04 PM, Bruce Momjian br...@momjian.us wrote:
 On Fri, Jan 23, 2015 at 02:34:36PM -0500, Stephen Frost wrote:
   You'd have to replace the existing data directory on the master to do
   that, which pg_upgrade was designed specifically to not do, in case
   things went poorly.
 
  Why? Just rsync the new data directory onto the old directory on the
  standbys. That's fine and simple.

 That still doesn't address the need to use --size-only, it would just
 mean that you don't need to use -H.  If anything the -H part is the
 aspect which worries me the least about this approach.

 I can now confirm that it works, just as Stephen said.  I was able to
 upgrade a standby cluster that contained the regression database, and
 the pg_dump output was perfect.

 I am attaching doc instruction that I will add to all branches as soon
 as someone else confirms my results.  You will need to use rsync
 --itemize-changes to see the hard links being created, e.g.:

hf+ pgsql/data/base/16415/28188 = pgsql.old/data/base/16384/28188

My rsync manual page (on two different systems) mentions nothing about
remote_dir, so I'd be quite unable to follow your proposed directions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-01-27 Thread David Steele

On 1/27/15 6:09 PM, Jim Nasby wrote:
 The whole remote_dir discussion made me think of something... would
 --link-dest be any help here?

I'm pretty sure --link-dest would not be effective in this case.  The
problem exists on the source side and --link-dest only operates on the
destination.

-- 
- David Steele
da...@pgmasters.net




signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] pg_upgrade and rsync

2015-01-27 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Fri, Jan 23, 2015 at 1:48 PM, Andres Freund and...@2ndquadrant.com wrote:
 I don't understand why that'd be better than simply fixing (yes, that's
 imo the correct term) pg_upgrade to retain relfilenodes across the
 upgrade. Afaics there's no conflict risk and it'd make the clusters much
 more similar, which would be good; independent of rsyncing standbys.

 +1.

That's certainly impossible for the system catalogs, which means you
have to be able to deal with relfilenode discrepancies for them, which
means that maintaining the same relfilenodes for user tables is of
dubious value.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

On Tue, Jan 27, 2015 at 9:36 AM, Stephen Frost sfr...@snowman.net wrote:
 The example listed works, but only when it's a local rsync:

 rsync --archive --hard-links --size-only old_dir new_dir remote_dir

 Perhaps a better example (or additional one) would be with a remote
 rsync, including clarification of old and new dir, like so:

 (run in /var/lib/postgresql)
 rsync --archive --hard-links --size-only \
   9.3/main \
   9.4/main \
   server:/var/lib/postgresql/

 Note that 9.3/main and 9.4/main are two source directories for rsync to
 copy over, while server:/var/lib/postgresql/ is a remote destination
 directory.  The above directories match a default Debian/Ubuntu install.

My point is that Bruce's patch suggests looking for remote_dir in
the rsync documentation, but no such term appears there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

On Tue, Jan 27, 2015 at 9:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Jan 23, 2015 at 1:48 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
 I don't understand why that'd be better than simply fixing (yes, that's
 imo the correct term) pg_upgrade to retain relfilenodes across the
 upgrade. Afaics there's no conflict risk and it'd make the clusters much
 more similar, which would be good; independent of rsyncing standbys.

 +1.

 That's certainly impossible for the system catalogs, which means you
 have to be able to deal with relfilenode discrepancies for them, which
 means that maintaining the same relfilenodes for user tables is of
 dubious value.

Why is that impossible for the system catalogs?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-01-27 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
 On Sat, Jan 24, 2015 at 10:04 PM, Bruce Momjian br...@momjian.us wrote:
  On Fri, Jan 23, 2015 at 02:34:36PM -0500, Stephen Frost wrote:
You'd have to replace the existing data directory on the master to do
that, which pg_upgrade was designed specifically to not do, in case
things went poorly.
  
   Why? Just rsync the new data directory onto the old directory on the
   standbys. That's fine and simple.
 
  That still doesn't address the need to use --size-only, it would just
  mean that you don't need to use -H.  If anything the -H part is the
  aspect which worries me the least about this approach.
 
  I can now confirm that it works, just as Stephen said.  I was able to
  upgrade a standby cluster that contained the regression database, and
  the pg_dump output was perfect.
 
  I am attaching doc instruction that I will add to all branches as soon
  as someone else confirms my results.  You will need to use rsync
  --itemize-changes to see the hard links being created, e.g.:
 
 hf+ pgsql/data/base/16415/28188 = 
  pgsql.old/data/base/16384/28188
 
 My rsync manual page (on two different systems) mentions nothing about
 remote_dir, so I'd be quite unable to follow your proposed directions.

The example listed works, but only when it's a local rsync:

rsync --archive --hard-links --size-only old_dir new_dir remote_dir

Perhaps a better example (or additional one) would be with a remote
rsync, including clarification of old and new dir, like so:

(run in /var/lib/postgresql)
rsync --archive --hard-links --size-only \
  9.3/main \
  9.4/main \
  server:/var/lib/postgresql/

Note that 9.3/main and 9.4/main are two source directories for rsync to
copy over, while server:/var/lib/postgresql/ is a remote destination
directory.  The above directories match a default Debian/Ubuntu install.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_upgrade and rsync

2015-01-27 Thread Jim Nasby


On 1/27/15 9:29 AM, Stephen Frost wrote:

My point is that Bruce's patch suggests looking for remote_dir in
the rsync documentation, but no such term appears there.

Ah, well, perhaps we could simply add a bit of clarification to this:

for details on specifying optionremote_dir/


The whole remote_dir discussion made me think of something... would --link-dest 
be any help here?

   --link-dest=DIR
  This option behaves like --copy-dest, but unchanged files are 
hard linked from DIR to the des-
  tination  directory.   The  files  must be identical in all 
preserved attributes (e.g. permis-
  sions, possibly ownership) in order for the files to be linked 
together.  An example:

rsync -av --link-dest=$PWD/prior_dir host:src_dir/ new_dir/

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-01-27 Thread Bruce Momjian

On Mon, Jan 26, 2015 at 05:41:59PM -0500, Stephen Frost wrote:
 I've thought about it a fair bit actually and I agree that there is some
 risk to using rsync for *incremental* base backups.  That is, you have
 a setup where you loop with:
 
 pg_start_backup
 rsync - dest
 pg_stop_backup
 
 without using -I, changing what 'dest' is, or making sure it's empty
 every time.  The problem is the 1s-level granularity used on the
 timestamp.  A possible set of operations, all within 1s, is:
 
 file changed
 rsync starts copying the file
 file changed again (somewhere prior to where rsync is at)
 rsync finishes the file copy
 
 Now, this isn't actually a problem for the first time that file is
 backed up- the issue is if that file isn't changed again.  rsync won't
 re-copy it, but that change that rsync missed won't be in the WAL
 history for the *second* backup that's done (only the first), leading to
 a case where that file would end up corrupted.

Interesting problem, but doesn't rsync use sub-second accuracy?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-01-27 Thread David Steele

On 1/27/15 9:51 PM, Bruce Momjian wrote:
 According to my empirical testing on Linux and OSX the answer is no:
 rsync does not use sub-second accuracy.  This seems to be true even on
 file systems like ext4 that support millisecond mod times, at least it
 was true on Ubuntu 12.04 running ext4.

 Even on my laptop there is a full half-second of vulnerability for
 rsync.  Faster systems may have a larger window.
 OK, bummer.  Well, I don't think we ever recommend to run rsync without
 checksums, but the big problem is that rsync doesn't do checksums by
 default.  :-(

 pg_upgrade recommends using two rsyncs:

To make a valid copy of the old cluster, use commandrsync/ to create
a dirty copy of the old cluster while the server is running, then shut
down the old server and run commandrsync/ again to update the copy
with any changes to make it consistent.  You might want to exclude some

 I am afraid that will not work as it could miss changes, right?  When
 would the default mod-time checking every be safe?

According to my testing the default mod-time checking is never
completely safe in rsync.  I've worked around this in PgBackRest by
building the manifest and then waiting until the start of the next
second before starting to copy.  It was the only way I could make the
incremental backups reliable without requiring checksums (which are
optional as in rsync for performance).  Of course, you still have to
trust the clock for this to work.

This is definitely an edge case.  Not only does the file have to be
modified in the same second *after* rsync has done the copy, but the
file also has to not be modified in *any other subsequent second* before
the next incremental backup.  If the file is busy enough to have a
collision with rsync in that second, then it is very likely to be
modified before the next incremental backup which is generally a day or
so later.  And, of course, the backup where the issue occurs is fine -
it's the next backup that is invalid.

However, the hot/cold backup scheme as documented does make the race
condition more likely since the two backups are done in close proximity
temporally.  Ultimately, the most reliable method is to use checksums.

For me the biggest issue is that there is no way to discover if a db in
consistent no matter how much time/resources you are willing to spend. 
I could live with the idea of the occasional bad backup (since I keep as
many as possible), but having no way to know whether it is good or not
is very frustrating.  I know data checksums are a step in that
direction, but they are a long way from providing the optimal solution. 
I've implemented rigorous checksums in PgBackRest but something closer
to the source would be even better.

-- 
- David Steele
da...@pgmasters.net




signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] pg_upgrade and rsync

On Fri, Jan 23, 2015 at 1:48 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-01-22 20:54:47 -0500, Stephen Frost wrote:
 * Bruce Momjian (br...@momjian.us) wrote:
  On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:
   Or do you - as the text edited in your patch, but not the quote above -
   mean to run pg_upgrade just on the primary and then rsync?
 
  No, I was going to run it on both, then rsync.

 I'm pretty sure this is all a lot easier than you believe it to be.  If
 you want to recreate what pg_upgrade does to a cluster then the simplest
 thing to do is rsync before removing any of the hard links.  rsync will
 simply recreate the same hard link tree that pg_upgrade created when it
 ran, and update files which were actually changed (the catalog tables).

 I don't understand why that'd be better than simply fixing (yes, that's
 imo the correct term) pg_upgrade to retain relfilenodes across the
 upgrade. Afaics there's no conflict risk and it'd make the clusters much
 more similar, which would be good; independent of rsyncing standbys.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-01-27 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 27, 2015 at 9:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 That's certainly impossible for the system catalogs, which means you
 have to be able to deal with relfilenode discrepancies for them, which
 means that maintaining the same relfilenodes for user tables is of
 dubious value.

 Why is that impossible for the system catalogs?

New versions aren't guaranteed to have the same system catalogs, let alone
the same relfilenodes for them.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-01-27 Thread Andres Freund

On 2015-01-27 10:20:48 -0500, Robert Haas wrote:
 On Tue, Jan 27, 2015 at 9:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  On Fri, Jan 23, 2015 at 1:48 PM, Andres Freund and...@2ndquadrant.com 
  wrote:
  I don't understand why that'd be better than simply fixing (yes, that's
  imo the correct term) pg_upgrade to retain relfilenodes across the
  upgrade. Afaics there's no conflict risk and it'd make the clusters much
  more similar, which would be good; independent of rsyncing standbys.
 
  +1.
 
  That's certainly impossible for the system catalogs, which means you
  have to be able to deal with relfilenode discrepancies for them, which
  means that maintaining the same relfilenodes for user tables is of
  dubious value.
 
 Why is that impossible for the system catalogs?

Maybe it's not impossible for existing catalogs, but it's certainly
complicated. But I don't think it's all that desirable anyway - they're
not the same relation after the pg_upgrade anyway (initdb/pg_dump
filled them). That's different for the user defined relations.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-01-27 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Tue, Jan 27, 2015 at 9:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  That's certainly impossible for the system catalogs, which means you
  have to be able to deal with relfilenode discrepancies for them, which
  means that maintaining the same relfilenodes for user tables is of
  dubious value.
 
  Why is that impossible for the system catalogs?
 
 New versions aren't guaranteed to have the same system catalogs, let alone
 the same relfilenodes for them.

Indeed, new versions almost certainly have wholly new system catalogs.

While there might be a reason to keep the relfilenodes the same, it
doesn't actually help with the pg_upgrade use-case we're currently
discussing (at least, not without additional help).  The problem is that
we certainly must transfer all the new catalogs, but how would rsync
know that those catalog files have to be transferred but not the user
relations?  Using --size-only would mean that system catalogs whose
sizes happen to match after the upgrade wouldn't be transferred and that
would certainly lead to a corrupt situation.

Andres proposed a helper script which would go through the entire tree
on the remote side and set all the timestamps on the remote side to
match those on the local side (prior to the pg_upgrade).  If all the
relfilenodes remained the same and the timestamps on the catalog tables
all changed then it might work to do (without using --size-only):

stop-cluster
set-timestamp-script
pg_upgrade
rsync new_data_dir - remote:existing_cluster

This would mean that any other files which happened to be changed by
pg_upgrade beyond the catalog tables would also get copied across.  The
issue that I see with that is that if the pg_upgrade process does touch
anything outside of the system catalogs, then its documented revert
mechanism (rename the control file and start the old cluster back up,
prior to having started the new cluster) wouldn't be valid.  Requiring
an extra script which runs around changing timestamps on files is a bit
awkward too, though I suppose possible, and then we'd also have to
document that this process only works with $version of pg_upgrade that
does the preservation of the relfilenodes.

I suppose there's also technically a race condition to consider, if the
whole thing is scripted and pg_upgrade manages to change an existing
file in the same second that the old cluster did then that file wouldn't
be recognized by the rsync as having been updated.  That's not too hard
to address though- just wait a second somewhere in there.  Still, I'm
not really sure that this approach really gains us much over the
approach that Bruce is proposing.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_upgrade and rsync

2015-01-27 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
 On Tue, Jan 27, 2015 at 9:36 AM, Stephen Frost sfr...@snowman.net wrote:
  The example listed works, but only when it's a local rsync:
 
  rsync --archive --hard-links --size-only old_dir new_dir remote_dir
 
  Perhaps a better example (or additional one) would be with a remote
  rsync, including clarification of old and new dir, like so:
 
  (run in /var/lib/postgresql)
  rsync --archive --hard-links --size-only \
9.3/main \
9.4/main \
server:/var/lib/postgresql/
 
  Note that 9.3/main and 9.4/main are two source directories for rsync to
  copy over, while server:/var/lib/postgresql/ is a remote destination
  directory.  The above directories match a default Debian/Ubuntu install.
 
 My point is that Bruce's patch suggests looking for remote_dir in
 the rsync documentation, but no such term appears there.

Ah, well, perhaps we could simply add a bit of clarification to this:

for details on specifying optionremote_dir/

like so:

for details on specifying the destination optionremote_dir/

?

On my system, the rsync man page has '[DEST]' in the synopsis, but it
doesn't actually go on to specifically define what 'DEST' is, rather
referring to it later as 'destination' or 'remote directory'.

I'm sure other suggestions would be welcome if they'd help clarify.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_upgrade and rsync

2015-01-27 Thread Bruce Momjian

On Tue, Jan 27, 2015 at 09:36:58AM -0500, Stephen Frost wrote:
 The example listed works, but only when it's a local rsync:
 
 rsync --archive --hard-links --size-only old_dir new_dir remote_dir
 
 Perhaps a better example (or additional one) would be with a remote
 rsync, including clarification of old and new dir, like so:
 
 (run in /var/lib/postgresql)
 rsync --archive --hard-links --size-only \
   9.3/main \
   9.4/main \
   server:/var/lib/postgresql/
 
 Note that 9.3/main and 9.4/main are two source directories for rsync to
 copy over, while server:/var/lib/postgresql/ is a remote destination
 directory.  The above directories match a default Debian/Ubuntu install.

OK, sorry everyone was confused by 'remote_dir'.  Does this new patch
help?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml
new file mode 100644
index e1cd260..4e4fe64
*** a/doc/src/sgml/pgupgrade.sgml
--- b/doc/src/sgml/pgupgrade.sgml
*** pg_upgrade.exe
*** 409,414 
--- 409,486 
 /step
  
 step
+ titleUpgrade any Log-Shipping Standby Servers/title
+ 
+ para
+  If you have Log-Shipping Standby Servers (xref
+  linkend=warm-standby), follow these steps to upgrade them (before
+  starting any servers):
+ /para
+ 
+ procedure
+ 
+  step
+   titleInstall the new PostgreSQL binaries on standby servers/title
+ 
+   para
+Make sure the new binaries and support files are installed
+on all the standby servers.  Do emphasisnot/ run
+applicationinitdb/.  If applicationinitdb/ was run, delete
+the standby server data directories.  Also, install any custom
+shared object files on the new standbys that you installed in the
+new master cluster.
+   /para
+  /step
+ 
+  step
+   titleRun applicationrsync//title
+ 
+   para
+From a directory that is above the old and new database cluster
+directories, run this for each slave:
+ 
+ programlisting
+rsync --archive --hard-links --size-only old_dir new_dir remote_dir
+ /programlisting
+ 
+where optionold_dir/ and optionnew_dir/ are relative
+to the current directory, and optionremote_dir/ is
+emphasisabove/ the old and new cluster directories on
+the standby server.  The old and new relative cluster paths
+must match on the master and standby server.  Consult the
+applicationrsync/ manual page for details on specifying the
+remote directory, e.g. literalslavehost:/var/lib/postgresql//.
+applicationrsync/ will be fast when option--link/ mode is
+used because it will create hard links on the remote server rather
+than transfering user data.
+   /para
+  /step
+ 
+  step
+   titleConfigure log-shipping to standby servers/title
+ 
+   para
+Configure the servers for log shipping.  (You do not need to run
+functionpg_start_backup()/ and functionpg_stop_backup()/
+or take a file system backup as the slaves are still sychronized
+with the master.)
+   /para
+  /step
+ 
+ /procedure
+ 
+/step
+ 
+step
+ titleStart the new server/title
+ 
+ para
+  The new server and any applicationrsync/'ed standby servers can
+  now be safely started.
+ /para
+/step
+ 
+step
  titlePost-Upgrade processing/title
  
  para
*** psql --username postgres --file script.s
*** 548,562 
/para
  
para
-A Log-Shipping Standby Server (xref linkend=warm-standby) cannot
-be upgraded because the server must allow writes.  The simplest way
-is to upgrade the primary and use commandrsync/ to rebuild the
-standbys.  You can run commandrsync/ while the primary is down,
-or as part of a base backup (xref linkend=backup-base-backup)
-which overwrites the old standby cluster.
-   /para
- 
-   para
 If you want to use link mode and you do not want your old cluster
 to be modified when the new cluster is started, make a copy of the
 old cluster and upgrade that in link mode. To make a valid copy
--- 620,625 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-01-27 Thread Bruce Momjian

On Tue, Jan 27, 2015 at 09:44:51PM -0500, David Steele wrote:
 On 1/27/15 9:32 PM, Bruce Momjian wrote
  Now, this isn't actually a problem for the first time that file is
  backed up- the issue is if that file isn't changed again.  rsync won't
  re-copy it, but that change that rsync missed won't be in the WAL
  history for the *second* backup that's done (only the first), leading to
  a case where that file would end up corrupted.
  Interesting problem, but doesn't rsync use sub-second accuracy?
 
 According to my empirical testing on Linux and OSX the answer is no:
 rsync does not use sub-second accuracy.  This seems to be true even on
 file systems like ext4 that support millisecond mod times, at least it
 was true on Ubuntu 12.04 running ext4.
 
 Even on my laptop there is a full half-second of vulnerability for
 rsync.  Faster systems may have a larger window.

OK, bummer.  Well, I don't think we ever recommend to run rsync without
checksums, but the big problem is that rsync doesn't do checksums by
default.  :-(

pg_upgrade recommends using two rsyncs:

   To make a valid copy of the old cluster, use commandrsync/ to create
   a dirty copy of the old cluster while the server is running, then shut
   down the old server and run commandrsync/ again to update the copy
   with any changes to make it consistent.  You might want to exclude some

I am afraid that will not work as it could miss changes, right?  When
would the default mod-time checking every be safe?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-01-27 Thread David Steele

On 1/27/15 9:32 PM, Bruce Momjian wrote
 Now, this isn't actually a problem for the first time that file is
 backed up- the issue is if that file isn't changed again.  rsync won't
 re-copy it, but that change that rsync missed won't be in the WAL
 history for the *second* backup that's done (only the first), leading to
 a case where that file would end up corrupted.
 Interesting problem, but doesn't rsync use sub-second accuracy?

According to my empirical testing on Linux and OSX the answer is no:
rsync does not use sub-second accuracy.  This seems to be true even on
file systems like ext4 that support millisecond mod times, at least it
was true on Ubuntu 12.04 running ext4.

Even on my laptop there is a full half-second of vulnerability for
rsync.  Faster systems may have a larger window.

-- 
- David Steele
da...@pgmasters.net




signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] pg_upgrade and rsync

2015-01-26 Thread Jim Nasby

On 1/23/15 12:40 PM, Stephen Frost wrote:

That said, the whole timestamp race condition in rsync gives me the
heebie-jeebies. For normal workloads maybe it's not that big a deal, but when
dealing with fixed-size data (ie: Postgres blocks)? Eww.

The race condition is a problem for pg_start/stop_backup and friends.
In this instance, everything will be shut down when the rsync is
running, so there isn't a timestamp race condition to worry about.

Yeah, I'm more concerned about people that use rsync to take base backups. Do
we need to explicitly advise against that? Is there a way to work around this
with a sleep after pg_start_backup to make sure all timestamps must be
different? (Admittedly I haven't fully wrapped my head around this yet.)

How horribly difficult would it be to allow pg_upgrade to operate on multiple servers?
Could we have it create a shell script instead of directly modifying things itself? Or
perhaps some custom command file that could then be replayed by pg_upgrade on
another server? Of course, that's assuming that replicas are compatible enough with masters
for that to work...

Yeah, I had suggested that to Bruce also, but it's not clear why that
would be any different from an rsync --size-only in the end, presuming
everything went according to plan.

Yeah, if everything is shut down maybe we're OK.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-01-26 Thread David Steele

On 1/26/15 5:11 PM, Jim Nasby wrote:
 The race condition is a problem for pg_start/stop_backup and friends.
 In this instance, everything will be shut down when the rsync is
 running, so there isn't a timestamp race condition to worry about.

 Yeah, I'm more concerned about people that use rsync to take base
 backups. Do we need to explicitly advise against that? Is there a way
 to work around this with a sleep after pg_start_backup to make sure
 all timestamps must be different? (Admittedly I haven't fully wrapped
 my head around this yet.)
A sleep in pg_start_backup() won't work.  The race condition is in rsync
if the file is modified in the same second after it is copied.  Waiting
until the beginning of the next second in pg_start_backup() would
actually make a bigger window where the issue can occur.

I solved this problem in PgBackRest (an alternative to barman, etc.) by
waiting the remainder of the second after the manifest is built before
copying.  That way, if a file is modified in the second after the
manifest is built that later version will still be copied.  Any mods
after that will be copied in the next backup (as they should be). 
PgBackRest does not use rsync, tar, etc.) so I was able to code around
the issue.

The interesting thing about this race condition is that it does not
affect the backup where it occurs.  It affects the next backup when the
modified file does not get copied because the timestamp is the same as
the previous backup.  Of course using checksums will solve the problem
in rsync but that's expensive.

Thus my comment earlier that the hot rsync / cold rsync method is not
absolutely safe.  If you do checksums on the cold rsync then you might
as well just use them the first time - you'll have the same downtime
either way.

I've written tests to show the rsync vulnerability and another to show
that this can affect a running database.  However, to reproduce it
reliably you need to force a checkpoint or have them happening pretty
close together.

-- 
- David Steele
da...@pgmasters.ne




signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] pg_upgrade and rsync

2015-01-26 Thread Stephen Frost

Jim,

* Jim Nasby (jim.na...@bluetreble.com) wrote:
 On 1/23/15 12:40 PM, Stephen Frost wrote:
 That said, the whole timestamp race condition in rsync gives me the 
 heebie-jeebies. For normal workloads maybe it's not that big a deal, but 
 when dealing with fixed-size data (ie: Postgres blocks)? Eww.
 The race condition is a problem for pg_start/stop_backup and friends.
 In this instance, everything will be shut down when the rsync is
 running, so there isn't a timestamp race condition to worry about.
 
 Yeah, I'm more concerned about people that use rsync to take base backups. Do 
 we need to explicitly advise against that? Is there a way to work around this 
 with a sleep after pg_start_backup to make sure all timestamps must be 
 different? (Admittedly I haven't fully wrapped my head around this yet.)

I've thought about it a fair bit actually and I agree that there is some
risk to using rsync for *incremental* base backups.  That is, you have
a setup where you loop with:

pg_start_backup
rsync - dest
pg_stop_backup

without using -I, changing what 'dest' is, or making sure it's empty
every time.  The problem is the 1s-level granularity used on the
timestamp.  A possible set of operations, all within 1s, is:

file changed
rsync starts copying the file
file changed again (somewhere prior to where rsync is at)
rsync finishes the file copy

Now, this isn't actually a problem for the first time that file is
backed up- the issue is if that file isn't changed again.  rsync won't
re-copy it, but that change that rsync missed won't be in the WAL
history for the *second* backup that's done (only the first), leading to
a case where that file would end up corrupted.

This is a pretty darn narrow situation and one that I doubt many people
will hit, but I do think it's possible.

A way to address this would be to grab all timestamps for all files
at the start of the backup and re-copy any files whose times are changed
after that point (or which were being changed at the time the check was
done, or perhaps simply any file which has a timestamp after the
starting timestamp of the backup).

 How horribly difficult would it be to allow pg_upgrade to operate on 
 multiple servers? Could we have it create a shell script instead of 
 directly modifying things itself? Or perhaps some custom command file 
 that could then be replayed by pg_upgrade on another server? Of course, 
 that's assuming that replicas are compatible enough with masters for that 
 to work...
 Yeah, I had suggested that to Bruce also, but it's not clear why that
 would be any different from an rsync --size-only in the end, presuming
 everything went according to plan.
 
 Yeah, if everything is shut down maybe we're OK.

Regarding this, yes, I think it 'should' work, but it would definitely
be good to test it quite a bit before relying on it..

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_upgrade and rsync

2015-01-26 Thread Jim Nasby


On 1/26/15 5:08 PM, David Steele wrote:

I've written tests to show the rsync vulnerability and another to show
that this can affect a running database.  However, to reproduce it
reliably you need to force a checkpoint or have them happening pretty
close together.


Related to this and Stephen's comment about testing... ISTM it would be very 
useful to have a published suite of tests for PITR backups, perhaps even 
utilizing special techniques in Postgres to expose potential failure 
conditions. Similarly, it'd also be nice to have a suite of tests you could run 
to validate a backup that you've restored.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-01-24 Thread Bruce Momjian

On Fri, Jan 23, 2015 at 02:34:36PM -0500, Stephen Frost wrote:
   You'd have to replace the existing data directory on the master to do
   that, which pg_upgrade was designed specifically to not do, in case
   things went poorly.
  
  Why? Just rsync the new data directory onto the old directory on the
  standbys. That's fine and simple.
 
 That still doesn't address the need to use --size-only, it would just
 mean that you don't need to use -H.  If anything the -H part is the
 aspect which worries me the least about this approach.

I can now confirm that it works, just as Stephen said.  I was able to
upgrade a standby cluster that contained the regression database, and
the pg_dump output was perfect.

I am attaching doc instruction that I will add to all branches as soon
as someone else confirms my results.  You will need to use rsync
--itemize-changes to see the hard links being created, e.g.:

   hf+ pgsql/data/base/16415/28188 = pgsql.old/data/base/16384/28188

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml
new file mode 100644
index e1cd260..91f40ce
*** a/doc/src/sgml/pgupgrade.sgml
--- b/doc/src/sgml/pgupgrade.sgml
*** pg_upgrade.exe
*** 409,414 
--- 409,484 
 /step
  
 step
+ titleUpgrade any Log-Shipping Standby Servers/title
+ 
+ para
+  If you have Log-Shipping Standby Servers (xref
+  linkend=warm-standby), follow these steps to upgrade them (before
+  starting any servers):
+ /para
+ 
+ procedure
+ 
+  step
+   titleInstall the new PostgreSQL binaries on standby servers/title
+ 
+   para
+Make sure the new binaries and support files are installed
+on all the standby servers.  Do emphasisnot/ run
+applicationinitdb/.  If applicationinitdb/ was run, delete
+the standby server data directories.  Also, install any custom
+shared object files on the new standbys that you installed in the
+new master cluster.
+   /para
+  /step
+ 
+  step
+   titleRun applicationrsync//title
+ 
+   para
+From a directory that is above the old and new database cluster
+directories, run this for each slave:
+ 
+ programlisting
+rsync --archive --hard-links --size-only old_dir new_dir remote_dir
+ /programlisting
+ 
+where optionold_dir/ and optionnew_dir/ are relative to the
+current directory, and optionremote_dir/ is emphasisabove/
+the old and new cluster directories on the standby server.  The old
+and new relative cluster paths must match on the master and standby
+server.  Consult the applicationrsync/ manual page for details
+on specifying optionremote_dir/.  applicationrsync/ will
+be fast when option--link/ mode is used because it will create
+hard links on the remote server rather than transfering user data.
+   /para
+  /step
+ 
+  step
+   titleConfigure log-shipping to standby servers/title
+ 
+   para
+Configure the servers for log shipping.  (You do not need to run
+functionpg_start_backup()/ and functionpg_stop_backup()/
+or take a file system backup as the slaves are still sychronized
+with the master.)
+   /para
+  /step
+ 
+ /procedure
+ 
+/step
+ 
+step
+ titleStart the new server/title
+ 
+ para
+  The new server and any applicationrsync/'ed standby servers can
+  now be safely started.
+ /para
+/step
+ 
+step
  titlePost-Upgrade processing/title
  
  para
*** psql --username postgres --file script.s
*** 548,562 
/para
  
para
-A Log-Shipping Standby Server (xref linkend=warm-standby) cannot
-be upgraded because the server must allow writes.  The simplest way
-is to upgrade the primary and use commandrsync/ to rebuild the
-standbys.  You can run commandrsync/ while the primary is down,
-or as part of a base backup (xref linkend=backup-base-backup)
-which overwrites the old standby cluster.
-   /para
- 
-   para
 If you want to use link mode and you do not want your old cluster
 to be modified when the new cluster is started, make a copy of the
 old cluster and upgrade that in link mode. To make a valid copy
--- 618,623 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-01-23 Thread Jim Nasby


On 1/22/15 7:54 PM, Stephen Frost wrote:

* Bruce Momjian (br...@momjian.us) wrote:

On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:

 Or do you - as the text edited in your patch, but not the quote above -
 mean to run pg_upgrade just on the primary and then rsync?


No, I was going to run it on both, then rsync.

I'm pretty sure this is all a lot easier than you believe it to be.  If
you want to recreate what pg_upgrade does to a cluster then the simplest
thing to do is rsync before removing any of the hard links.  rsync will
simply recreate the same hard link tree that pg_upgrade created when it
ran, and update files which were actually changed (the catalog tables).

The problem, as mentioned elsewhere, is that you have to checksum all
the files because the timestamps will differ.  You can actually get
around that with rsync if you really want though- tell it to only look
at file sizes instead of size+time by passing in --size-only.


What if instead of trying to handle that on the rsync side, we changed 
pg_upgrade so that it created hardlinks that had the same timestamp as the 
original file?

That said, the whole timestamp race condition in rsync gives me the 
heebie-jeebies. For normal workloads maybe it's not that big a deal, but when 
dealing with fixed-size data (ie: Postgres blocks)? Eww.

How horribly difficult would it be to allow pg_upgrade to operate on multiple servers? 
Could we have it create a shell script instead of directly modifying things itself? Or 
perhaps some custom command file that could then be replayed by pg_upgrade on 
another server? Of course, that's assuming that replicas are compatible enough with 
masters for that to work...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2015-01-23 13:52:54 -0500, Stephen Frost wrote:
  That wouldn't actually help with what Bruce is trying to do, which
  is to duplicate the results of the pg_upgrade from the master over to
  the standby.
 
 Well, it'd pretty much obliviate the need to run pg_upgrade on the
 standby. As there's no renamed files you don't need to muck around with
 leaving hardlinks in place and such just so that rsync recognizes
 unchanged files.

Uh, pg_upgrade always either creates a hard link tree or copies
everything over.  If I follow what you're suggesting, pg_upgrade would
need a new 'in-place' mode that removes all of the catalog tables from
the old cluster and puts the new catalog tables into place and leaves
everything else alone.

I don't really think I'd want to go there either..

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_upgrade and rsync

* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2015-01-23 14:27:51 -0500, Stephen Frost wrote:
  * Andres Freund (and...@2ndquadrant.com) wrote:
   On 2015-01-23 14:05:10 -0500, Stephen Frost wrote:
If I follow what you're suggesting, pg_upgrade would
need a new 'in-place' mode that removes all of the catalog tables from
the old cluster and puts the new catalog tables into place and leaves
everything else alone.
   
   No. Except that it'd preserve the relfilenodes (i.e. the filenames of
   relations) it'd work exactly the same as today. The standby is simply
   updated by rsyncing the new data directory of the primary to the
   standby.
  
  You'd have to replace the existing data directory on the master to do
  that, which pg_upgrade was designed specifically to not do, in case
  things went poorly.
 
 Why? Just rsync the new data directory onto the old directory on the
 standbys. That's fine and simple.

That still doesn't address the need to use --size-only, it would just
mean that you don't need to use -H.  If anything the -H part is the
aspect which worries me the least about this approach.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_upgrade and rsync

* Jim Nasby (jim.na...@bluetreble.com) wrote:
 On 1/22/15 7:54 PM, Stephen Frost wrote:
 * Bruce Momjian (br...@momjian.us) wrote:
 On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:
  Or do you - as the text edited in your patch, but not the quote above -
  mean to run pg_upgrade just on the primary and then rsync?
 
 No, I was going to run it on both, then rsync.
 I'm pretty sure this is all a lot easier than you believe it to be.  If
 you want to recreate what pg_upgrade does to a cluster then the simplest
 thing to do is rsync before removing any of the hard links.  rsync will
 simply recreate the same hard link tree that pg_upgrade created when it
 ran, and update files which were actually changed (the catalog tables).
 
 The problem, as mentioned elsewhere, is that you have to checksum all
 the files because the timestamps will differ.  You can actually get
 around that with rsync if you really want though- tell it to only look
 at file sizes instead of size+time by passing in --size-only.
 
 What if instead of trying to handle that on the rsync side, we changed 
 pg_upgrade so that it created hardlinks that had the same timestamp as the 
 original file?

So, two things, I chatted w/ Bruce and he was less concerned about the
lack of being able to match up the timestamps than I was.  He has a
point though- the catalog tables are going to get copied anyway since
they won't be hard links and checking that all the other files match in
size and that both the master and the standby are at the same xlog
position should give you a pretty good feeling that everything matches
up sufficiently.

Second, I don't follow what you mean by having pg_upgrade change the
hardlinks to have the same timestamp- for starters, the timestamp is in
the inode and not the actual hard link (two files hard linked together
won't have different timestamps..) and second, the problem isn't on the
master side- it's on the standby side.  The standby's files will have
timestamps different from the master and there really isn't much to be
done about that.

 That said, the whole timestamp race condition in rsync gives me the 
 heebie-jeebies. For normal workloads maybe it's not that big a deal, but when 
 dealing with fixed-size data (ie: Postgres blocks)? Eww.

The race condition is a problem for pg_start/stop_backup and friends.
In this instance, everything will be shut down when the rsync is
running, so there isn't a timestamp race condition to worry about.

 How horribly difficult would it be to allow pg_upgrade to operate on multiple 
 servers? Could we have it create a shell script instead of directly modifying 
 things itself? Or perhaps some custom command file that could then be 
 replayed by pg_upgrade on another server? Of course, that's assuming that 
 replicas are compatible enough with masters for that to work...

Yeah, I had suggested that to Bruce also, but it's not clear why that
would be any different from an rsync --size-only in the end, presuming
everything went according to plan.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_upgrade and rsync

On 2015-01-22 20:54:47 -0500, Stephen Frost wrote:
 * Bruce Momjian (br...@momjian.us) wrote:
  On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:
   Or do you - as the text edited in your patch, but not the quote above -
   mean to run pg_upgrade just on the primary and then rsync?
  
  No, I was going to run it on both, then rsync.
 
 I'm pretty sure this is all a lot easier than you believe it to be.  If
 you want to recreate what pg_upgrade does to a cluster then the simplest
 thing to do is rsync before removing any of the hard links.  rsync will
 simply recreate the same hard link tree that pg_upgrade created when it
 ran, and update files which were actually changed (the catalog tables).

I don't understand why that'd be better than simply fixing (yes, that's
imo the correct term) pg_upgrade to retain relfilenodes across the
upgrade. Afaics there's no conflict risk and it'd make the clusters much
more similar, which would be good; independent of rsyncing standbys.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

On 2015-01-23 13:52:54 -0500, Stephen Frost wrote:
 * Andres Freund (and...@2ndquadrant.com) wrote:
  On 2015-01-22 20:54:47 -0500, Stephen Frost wrote:
   * Bruce Momjian (br...@momjian.us) wrote:
On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:
 Or do you - as the text edited in your patch, but not the quote above 
 -
 mean to run pg_upgrade just on the primary and then rsync?

No, I was going to run it on both, then rsync.
   
   I'm pretty sure this is all a lot easier than you believe it to be.  If
   you want to recreate what pg_upgrade does to a cluster then the simplest
   thing to do is rsync before removing any of the hard links.  rsync will
   simply recreate the same hard link tree that pg_upgrade created when it
   ran, and update files which were actually changed (the catalog tables).
  
  I don't understand why that'd be better than simply fixing (yes, that's
  imo the correct term) pg_upgrade to retain relfilenodes across the
  upgrade. Afaics there's no conflict risk and it'd make the clusters much
  more similar, which would be good; independent of rsyncing standbys.
 
 That's an entirely orthogonal discussion from the original one though,
 no?

Don't think so.

 That wouldn't actually help with what Bruce is trying to do, which
 is to duplicate the results of the pg_upgrade from the master over to
 the standby.

Well, it'd pretty much obliviate the need to run pg_upgrade on the
standby. As there's no renamed files you don't need to muck around with
leaving hardlinks in place and such just so that rsync recognizes
unchanged files.

 Trying to pg_upgrade both the master and the standby, to me at least,
 seems like an even *worse* approach than trusting rsync with -H and
 --size-only..

I think running pg_upgrade on the standby is a dangerous folly.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

On 2015-01-23 14:05:10 -0500, Stephen Frost wrote:
 * Andres Freund (and...@2ndquadrant.com) wrote:
  On 2015-01-23 13:52:54 -0500, Stephen Frost wrote:
   That wouldn't actually help with what Bruce is trying to do, which
   is to duplicate the results of the pg_upgrade from the master over to
   the standby.
  
  Well, it'd pretty much obliviate the need to run pg_upgrade on the
  standby. As there's no renamed files you don't need to muck around with
  leaving hardlinks in place and such just so that rsync recognizes
  unchanged files.
 
 Uh, pg_upgrade always either creates a hard link tree or copies
 everything over.

Yes. The problem is that the filenames after pg_upgrade aren't the same
as before. Which means that a simple rsync call won't be able to save
anything because the standby's filenames differ.  What you can do is
rsync both cluster directories (i.e. the old and the post pg_upgrade
ones) and use rsync -H, right? Without transferring both -H won't detect
the hardlinks as they need to be in the synced set. That's pretty
cumbersome/complicated, and far from cheap.

 If I follow what you're suggesting, pg_upgrade would
 need a new 'in-place' mode that removes all of the catalog tables from
 the old cluster and puts the new catalog tables into place and leaves
 everything else alone.

No. Except that it'd preserve the relfilenodes (i.e. the filenames of
relations) it'd work exactly the same as today. The standby is simply
updated by rsyncing the new data directory of the primary to the
standby.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2015-01-22 20:54:47 -0500, Stephen Frost wrote:
  * Bruce Momjian (br...@momjian.us) wrote:
   On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:
Or do you - as the text edited in your patch, but not the quote above -
mean to run pg_upgrade just on the primary and then rsync?
   
   No, I was going to run it on both, then rsync.
  
  I'm pretty sure this is all a lot easier than you believe it to be.  If
  you want to recreate what pg_upgrade does to a cluster then the simplest
  thing to do is rsync before removing any of the hard links.  rsync will
  simply recreate the same hard link tree that pg_upgrade created when it
  ran, and update files which were actually changed (the catalog tables).
 
 I don't understand why that'd be better than simply fixing (yes, that's
 imo the correct term) pg_upgrade to retain relfilenodes across the
 upgrade. Afaics there's no conflict risk and it'd make the clusters much
 more similar, which would be good; independent of rsyncing standbys.

That's an entirely orthogonal discussion from the original one though,
no?  That wouldn't actually help with what Bruce is trying to do, which
is to duplicate the results of the pg_upgrade from the master over to
the standby.  Even if the relfilenodes were the same across the upgrade,
I don't think it'd be a good idea to run pg_upgrade on the standby and
hope the results match close enough to the master that you can trust
updates to the catalog tables on the standby from the master going
forward to work..

Trying to pg_upgrade both the master and the standby, to me at least,
seems like an even *worse* approach than trusting rsync with -H and
--size-only..

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_upgrade and rsync

* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2015-01-23 14:05:10 -0500, Stephen Frost wrote:
  * Andres Freund (and...@2ndquadrant.com) wrote:
   On 2015-01-23 13:52:54 -0500, Stephen Frost wrote:
That wouldn't actually help with what Bruce is trying to do, which
is to duplicate the results of the pg_upgrade from the master over to
the standby.
   
   Well, it'd pretty much obliviate the need to run pg_upgrade on the
   standby. As there's no renamed files you don't need to muck around with
   leaving hardlinks in place and such just so that rsync recognizes
   unchanged files.
  
  Uh, pg_upgrade always either creates a hard link tree or copies
  everything over.
 
 Yes. The problem is that the filenames after pg_upgrade aren't the same
 as before. Which means that a simple rsync call won't be able to save
 anything because the standby's filenames differ.  What you can do is
 rsync both cluster directories (i.e. the old and the post pg_upgrade
 ones) and use rsync -H, right? Without transferring both -H won't detect
 the hardlinks as they need to be in the synced set. That's pretty
 cumbersome/complicated, and far from cheap.

The filenames don't need to be the same for rsync -H to work.  You
specifically do *not* want to independently rsync the old and new
clusters- you need to run a single rsync (and one for each tablespace)
with -H and then it'll realize that the old cluster on both systems is
identical and will just recreate the hard links, and copy the completely
new files (the catalog tables).

  If I follow what you're suggesting, pg_upgrade would
  need a new 'in-place' mode that removes all of the catalog tables from
  the old cluster and puts the new catalog tables into place and leaves
  everything else alone.
 
 No. Except that it'd preserve the relfilenodes (i.e. the filenames of
 relations) it'd work exactly the same as today. The standby is simply
 updated by rsyncing the new data directory of the primary to the
 standby.

You'd have to replace the existing data directory on the master to do
that, which pg_upgrade was designed specifically to not do, in case
things went poorly.  You'd still have to deal with the tablespace
directories being renamed also, since we include the major version and
catalog build in the directory name..

This whole process really isn't all that complicated in the end..

my_data_dir/old_cluster
my_data_dir/new_cluster

pg_upgrade
rsync -H --size-only my_data_dir/ standby:/path/to/my_data_dir
start the clusters
remove the old cluster on the master and standby.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_upgrade and rsync