On Fri, Jun 2, 2017 at 4:20 PM, Dmitriy Sarafannikov
<dsarafanni...@yandex.ru> wrote:
> Thanks for all.
> We found the source of the problem. It was mistake in upgrade to 9.6.
> We upgrade replica with rsync as it is in the documentation:
> rsync --verbose --relative --archive --hard-links --size-only old_pgdata 
> new_pgdata remote_dir
> We must provide 100% read-only availability of our shard (master + 2 
> replicas).
> So we can’t stop master and both replicas, upgrade them one by one and start 
> them.
> We do it as follows:
> Close master from load, stop master, upgrade it, stop 1st replica, upgrade 
> it, start 1st replica,
> stop 2nd replica, upgrade it, start 2nd replica, start master, open master.
> But upgraded replicas died under load without statistics and we decided to 
> perform
> analyze on master before upgrading replicas. In this case statistics would be 
> copied to replicas by rsync.
> The upgrade algorithm became as follows:
> Close master, stop master, close master from replicas (iptables), upgrade 
> master,
> start master, perform analyze, stop master, stop 1st replica, upgrade 1st 
> replica,
> start 1st replica, stop 2nd replica, upgrade 2nd replica, start 2nd replica,
> start master, open master.
> If autovacuum starts vacuuming relations while we are performing analyze, wal 
> records
> generated by it will not be replayed on replicas, because next step is 
> stopping
> master with checkpoint and new redo location LSN (newer that these wal 
> records)
> will appear in pg_control file, which then will be copied by rsync to 
> replicas.
> If it was simple vacuum, we most likely will not see the consequences. 
> Because it marks
> tuples as deleted, and some of the next new tuples will be placed here, and 
> due to FPW
> replicas will receive correct page, identical to master.
> But if it was vacuum to prevent wraparound, we will see situation like ours. 
> Tuples on
> master will be frozen, but on replicas not. And it will not change if nobody 
> will not
> update any tuple on this page.

Why didn't rsync made the copies on master and replica same?

> It’s dangerous, because, if we perform switchover to replica, «corrupted» page
> will be delivered to all replicas after next update of any tuple from this 
> page.
> We reproduced this case in our test environment and this assumption was 
> confirmed.
> Starting and stopping master after running pg_upgrade but before rsync to 
> collect statistics
> was a bad idea.
> We know how to find such «corrupted» tuples. And we want to fix this by 
> manually
> freezing tuples via calling specially written C functions. Functions are 
> «copy-pasted»
> and simplified code from vacuum functions with SQL interface (see attachment).
> Can you look on them? Do you think it is safe to use them for fixing 
> corrupted pages
> or is there a better way not to loose data?

I haven't looked in detail, but it sounds slightly risky proposition
to manipulate the tuples by writing C functions of the form you have
in your code.  I would have preferred some way to avoid this problem
by ensuring that replicas are properly synced (complete data of master
via WAL) or by disabling autovacuum.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to