Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

Linas Virbalas Thu, 22 Sep 2011 07:25:32 -0700

>>> 2.2. pg_start_backup(Obackup_under_loadš) on the master (this will take a
>>> while as master is loaded up);
>> 
>> No. if you use pg_start_backup('foo', true) it will be fast. Check the
>> manual.
> 
> If the server is sufficiently heavily loaded that a checkpoint takes a
> nontrivial amount of time, the OP is correct that this will be not
> fast, regardless of whether you choose to force an immediate
> checkpoint.


In order to check more cases, I have changed the procedure to force an
immediate checkpoint, i.e. pg_start_backup('backup_under_load', true). With
the same load generator running, pg_start_backup returned almost
instantaneously compared to how long it took previously.

Most importantly, after doing this change, I cannot reproduce the pg_clog
error message anymore. In other words, with immediate checkpoint hot backup
succeeds under this load!

>>> 2.3. rsync data/global/pg_control to the standby;
>> 
>> Why are you doing this? If ...
>> 
>>> 2.4. rsync all other data/ (without pg_xlog) to the standby;
>> 
>> you will copy it again or no? Don't understand your point.
> 
> His point is that exercising the bug depends on doing the copying in a
> certain order.  Any order of copying the data theoretically ought to
> be OK, as long as it's all between starting the backup and stopping
> the backup, but apparently it isn't.

Please note that in the past I was able to reproduce the same pg_clog error
even with taking a singular rsync of the whole data/ folder (i.e. without
splitting it into two steps).

>> The problem could be that the minimum recovery point (step 2.3) is different
>> from the end of rsync if you are under load.

Do you have ideas why does the Hot Backup operation with
pg_start_backup('backup_under_load', true) succeed while
pg_start_backup('backup_under_load') fails under the same load?

Originally, I was using pg_start_backup('backup_under_load') in order not to
clog the master server during the I/O required for the checkpoint. Of
course, now, it seems, this should be sacrificed for the sake of a
successful backup under load.

> It seems pretty clear that some relevant chunk of WAL isn't getting
> replayed, but it's not at all clear to me why not.  It seems like it
> would be useful to compare the LSN returned by pg_start_backup() with

If needed, I could do that, if I had the exact procedure... Currently,
during the start of the backup I take the following information:

pg_xlogfile_name(pg_start_backup(...))

> the location at which replay begins when you fire up the clone.

As you have seen in my original message, in the pg_log I get only the
restored WAL file names after starting up the standby. Can I tune the
postgresql.conf to include the location at which replay begins in the log?

> Could you provide us with the exact rsync version and parameters you use?

rsync -azv
version 2.6.8  protocol version 29

--
Sincerely,
Linas Virbalas
http://flyingclusters.blogspot.com/


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

Reply via email to