Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing
On 05/01/11 04:43, Devrim GÜNDÜZ wrote: On Fri, 2010-12-31 at 11:11 +1300, Mark Kirkwood wrote: I note that this uninitialized pages with standbys has cropped up from time to time - I wonder if in most/all the cases folk were using Pitrtools? I deployed Pitrtools a lot when I was working for CMD, and I haven't seen any issues with that. It is just a wrapper, nothing else... Note that if you perform the backup step when the system is quiet then there is typically no problem with the standby - it is only when the backup coincides with any level of activity that you can elicit the previously mentioned problems. Obviously the case I have included in the first mail shows up the problem immediately, but in fact it is reasonably hard to trigger and I spent quite a while getting to the state of being able to trigger the uninitialized pages on demand. While Pitrtools it is a wrapper, the use of "--exclude=backup_label" is incorrect and needs to be removed. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing
On Fri, 2010-12-31 at 11:11 +1300, Mark Kirkwood wrote: > > I note that this uninitialized pages with standbys has cropped up from > time to time - I wonder if in most/all the cases folk were using > Pitrtools? I deployed Pitrtools a lot when I was working for CMD, and I haven't seen any issues with that. It is just a wrapper, nothing else... Regards, -- Devrim GÜNDÜZ PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer PostgreSQL RPM Repository: http://yum.pgrpms.org Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing
On 31/12/10 11:11, Mark Kirkwood wrote: Yes, you (and Robert) are entirely correct, I was confused in my understanding of the "--delete --exclude=backup_label" and thought it to mean "exclude the backup label from the delete". Yeah the --delete is harmless, it is the exclude backup_label that is causing the problem. Note to all current Pitrtools users, this impacts you! We need to get a corrected version out soon I would think. Also (not surprisingly) I can confirm that data corruption is possible: 1/ Perform approx 14 transactions against the primary 2/ Cancel Pgbench 3/ Issue "SELECT pg_switch_xlog()" on primary 4/ Bring up standby after checking it has applied last log The resulting primary and standby should be identical, but: primary: bench=# SELECT count(*) FROM branches; count --- 100 bench=# SELECT count(*) FROM accounts; count -- 1000 standby: bench=# SELECT count(*) FROM branches; count --- 132 bench=# SELECT count(*) FROM accounts; count - 9998269 The other counts are the same. We have lost some accounts records, but have gained duplicates in branches: bench=# REINDEX TABLE branches; ERROR: could not create unique index "branches_pkey" DETAIL: Table contains duplicated values. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing
On 31/12/10 11:01, Heikki Linnakangas wrote: On 30.12.2010 10:55, Mark Kirkwood wrote: Removing the offending --delete --exclude=backup_label options from the base backup step makes everything work properly again. I don't see why --delete would make any difference, but you shouldn't exclude backup_label from the base backup. The backup label file is an important part of the online backup, it cannot be recovered safely without it. Yes, you (and Robert) are entirely correct, I was confused in my understanding of the "--delete --exclude=backup_label" and thought it to mean "exclude the backup label from the delete". Yeah the --delete is harmless, it is the exclude backup_label that is causing the problem. Note to all current Pitrtools users, this impacts you! We need to get a corrected version out soon I would think. I note that this uninitialized pages with standbys has cropped up from time to time - I wonder if in most/all the cases folk were using Pitrtools? regards Mark
Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing
On 30.12.2010 10:55, Mark Kirkwood wrote: Removing the offending --delete --exclude=backup_label options from the base backup step makes everything work properly again. I don't see why --delete would make any difference, but you shouldn't exclude backup_label from the base backup. The backup label file is an important part of the online backup, it cannot be recovered safely without it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing
On Thu, Dec 30, 2010 at 3:55 AM, Mark Kirkwood wrote: > Well, it is none of the things I considered. > > The problem seems to be due to use of "--delete" in the base backup rsync > (see diff attached). In fact I can now reproduce the uninitialized pages > using the "bare bones" method: Any time a relation is extended, we end up with a page of all zeros at the end until the updated page is written out, which often doesn't happen until the next checkpoint. So it doesn't seem too mysterious that you could end up with all zeroes pages on the standby initially, but WAL replay ought to fix that. I suppose the reason it isn't is because you've excluded the backup label, so recovery will begin from the wrong place. Unless I'm missing something, that seems like a really bad idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing
Well, it is none of the things I considered. The problem seems to be due to use of "--delete" in the base backup rsync (see diff attached). In fact I can now reproduce the uninitialized pages using the "bare bones" method: primary: $ grep archive_command postgresql.conf archive_command = 'rsync %p standby:/var/lib/postgresql/archive' $ pgbench -c 4 -t 20 bench (wait for approx 1 transactions) standby: $ psql -h primary -c "SELECT pg_start_backup('backup');" $ rsync --exclude pg_xlog/\* --exclude postmaster.pid --delete --exclude=backup_label \ primary:/var/lib/postgresql/8.3/main/* \ /var/lib/postgresql/8.3/main $ psql -h primary -c "SELECT pg_stop_backup(); $ grep restore_command recovery.conf restore_command = '/usr/lib/postgresql/8.3/bin/pg_standby -t /tmp/trigger.5432 /var/lib/postgresql/archive %f %p %r' $ /etc/init.d/postgresql-8.3 start (wait for approx 14 transactions) $ touch /tmp/trigger.5432 Removing the offending --delete --exclude=backup_label options from the base backup step makes everything work properly again. I'd be interested to know if the other folks getting these warnings were using unusual rsync options either during backup or for archiving. regards Mark On 30/12/10 13:32, Mark Kirkwood wrote: I'm frankly puzzled about what Pitrtools is doing that is different - I only noticed it using rsync compression (-z) and doing rsync backups via pulling from the standby rather than pushing from the primary (I'm in the process of trying these variations out in the bare bones case). Just as I'm writing this I see Pitrtools rsync's pg_xlog - I wonder if there is/are timing issues which mean that recovery might use some (corrupted) logs from there before the (clean) archived ones arrive (will check). *** cmd_standby.orig Tue Dec 28 21:10:31 2010 --- cmd_standby Thu Dec 30 05:20:04 2010 *** *** 175,181 if debug == "on": ssh_flags = "-vvv -o ConnectTimeout=%s -o StrictHostKeyChecking=no" % (str(ssh_timeout)) !rsync_flags = "-avzl --delete --stats --exclude=backup_label" pg_standby_flags = "-s5 -w0 -d -c" if pgversion == '8.2': pg_standby_args = "%%f %%p -k%s" % (float(numarchives)) --- 175,181 if debug == "on": ssh_flags = "-vvv -o ConnectTimeout=%s -o StrictHostKeyChecking=no" % (str(ssh_timeout)) !rsync_flags = "-a " pg_standby_flags = "-s5 -w0 -d -c" if pgversion == '8.2': pg_standby_args = "%%f %%p -k%s" % (float(numarchives)) *** *** 184,190 else: ssh_flags = "-o ConnectTimeout=%s -o StrictHostKeyChecking=no" % (str(ssh_timeout)) !rsync_flags = "-azl --delete --exclude=backup_label" pg_standby_flags = "-s5 -w0 -c" if pgversion == '8.2': pg_standby_args = "%%f %%p -k%s" % (float(numarchives)) --- 184,190 else: ssh_flags = "-o ConnectTimeout=%s -o StrictHostKeyChecking=no" % (str(ssh_timeout)) !rsync_flags = "-azl " pg_standby_flags = "-s5 -w0 -c" if pgversion == '8.2': pg_standby_args = "%%f %%p -k%s" % (float(numarchives)) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers