>>> 2.2. pg_start_backup(Obackup_under_loadš) on the master (this will take a >>> while as master is loaded up); >> >> No. if you use pg_start_backup('foo', true) it will be fast. Check the >> manual. > > If the server is sufficiently heavily loaded that a checkpoint takes a > nontrivial amount of time, the OP is correct that this will be not > fast, regardless of whether you choose to force an immediate > checkpoint.
In order to check more cases, I have changed the procedure to force an immediate checkpoint, i.e. pg_start_backup('backup_under_load', true). With the same load generator running, pg_start_backup returned almost instantaneously compared to how long it took previously. Most importantly, after doing this change, I cannot reproduce the pg_clog error message anymore. In other words, with immediate checkpoint hot backup succeeds under this load! >>> 2.3. rsync data/global/pg_control to the standby; >> >> Why are you doing this? If ... >> >>> 2.4. rsync all other data/ (without pg_xlog) to the standby; >> >> you will copy it again or no? Don't understand your point. > > His point is that exercising the bug depends on doing the copying in a > certain order. Any order of copying the data theoretically ought to > be OK, as long as it's all between starting the backup and stopping > the backup, but apparently it isn't. Please note that in the past I was able to reproduce the same pg_clog error even with taking a singular rsync of the whole data/ folder (i.e. without splitting it into two steps). >> The problem could be that the minimum recovery point (step 2.3) is different >> from the end of rsync if you are under load. Do you have ideas why does the Hot Backup operation with pg_start_backup('backup_under_load', true) succeed while pg_start_backup('backup_under_load') fails under the same load? Originally, I was using pg_start_backup('backup_under_load') in order not to clog the master server during the I/O required for the checkpoint. Of course, now, it seems, this should be sacrificed for the sake of a successful backup under load. > It seems pretty clear that some relevant chunk of WAL isn't getting > replayed, but it's not at all clear to me why not. It seems like it > would be useful to compare the LSN returned by pg_start_backup() with If needed, I could do that, if I had the exact procedure... Currently, during the start of the backup I take the following information: pg_xlogfile_name(pg_start_backup(...)) > the location at which replay begins when you fire up the clone. As you have seen in my original message, in the pg_log I get only the restored WAL file names after starting up the standby. Can I tune the postgresql.conf to include the location at which replay begins in the log? > Could you provide us with the exact rsync version and parameters you use? rsync -azv version 2.6.8 protocol version 29 -- Sincerely, Linas Virbalas http://flyingclusters.blogspot.com/ -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers