Re: [HACKERS] Broken hint bits (freeze)

Amit Kapila Thu, 08 Jun 2017 05:59:18 -0700

On Fri, Jun 2, 2017 at 4:20 PM, Dmitriy Sarafannikov
<dsarafanni...@yandex.ru> wrote:
> Thanks for all.
>
> We found the source of the problem. It was mistake in upgrade to 9.6.
>
> We upgrade replica with rsync as it is in the documentation:
> rsync --verbose --relative --archive --hard-links --size-only old_pgdata 
> new_pgdata remote_dir
>
> We must provide 100% read-only availability of our shard (master + 2 
> replicas).
> So we can’t stop master and both replicas, upgrade them one by one and start 
> them.
> We do it as follows:
> Close master from load, stop master, upgrade it, stop 1st replica, upgrade 
> it, start 1st replica,
> stop 2nd replica, upgrade it, start 2nd replica, start master, open master.
> But upgraded replicas died under load without statistics and we decided to 
> perform
> analyze on master before upgrading replicas. In this case statistics would be 
> copied to replicas by rsync.
> The upgrade algorithm became as follows:
> Close master, stop master, close master from replicas (iptables), upgrade 
> master,
> start master, perform analyze, stop master, stop 1st replica, upgrade 1st 
> replica,
> start 1st replica, stop 2nd replica, upgrade 2nd replica, start 2nd replica,
> start master, open master.
>
> If autovacuum starts vacuuming relations while we are performing analyze, wal 
> records
> generated by it will not be replayed on replicas, because next step is 
> stopping
> master with checkpoint and new redo location LSN (newer that these wal 
> records)
> will appear in pg_control file, which then will be copied by rsync to 
> replicas.
>
> If it was simple vacuum, we most likely will not see the consequences. 
> Because it marks
> tuples as deleted, and some of the next new tuples will be placed here, and 
> due to FPW
> replicas will receive correct page, identical to master.
> But if it was vacuum to prevent wraparound, we will see situation like ours. 
> Tuples on
> master will be frozen, but on replicas not. And it will not change if nobody 
> will not
> update any tuple on this page.
>


Why didn't rsync made the copies on master and replica same?

> It’s dangerous, because, if we perform switchover to replica, «corrupted» page
> will be delivered to all replicas after next update of any tuple from this 
> page.
>
> We reproduced this case in our test environment and this assumption was 
> confirmed.
>
> Starting and stopping master after running pg_upgrade but before rsync to 
> collect statistics
> was a bad idea.
>
> We know how to find such «corrupted» tuples. And we want to fix this by 
> manually
> freezing tuples via calling specially written C functions. Functions are 
> «copy-pasted»
> and simplified code from vacuum functions with SQL interface (see attachment).
> Can you look on them? Do you think it is safe to use them for fixing 
> corrupted pages
> or is there a better way not to loose data?
>

I haven't looked in detail, but it sounds slightly risky proposition
to manipulate the tuples by writing C functions of the form you have
in your code.  I would have preferred some way to avoid this problem
by ensuring that replicas are properly synced (complete data of master
via WAL) or by disabling autovacuum.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Broken hint bits (freeze)

Reply via email to