>>>>> "Tom" == Tom Lane <t...@sss.pgh.pa.us> writes:

 Tom>      * During recovery we ignore killed tuples and don't bother to kill 
them
 Tom>      * either. We do this because the xmin on the primary node could 
easily be
 Tom>      * later than the xmin on the standby node, so that what the primary
 Tom>      * thinks is killed is supposed to be visible on standby. So for 
correct
 Tom>      * MVCC for queries during recovery we must ignore these hints and 
check
 Tom>      * all tuples. Do *not* set ignore_killed_tuples to true when running 
in a
 Tom>      * transaction that was started during recovery. xactStartedInRecovery
 Tom>      * should not be altered by index AMs.

 Tom> but it seems to me that this is not terribly carefully thought through.

 Tom> 1. If global xmin on the primary is later than on the standby,
 Tom> VACUUM could remove tuples that should be visible on the standby,
 Tom> and that would shortly propagate to the standby. Simply ignoring
 Tom> index kill bits on the standby won't prevent that.

Right, but we have conflict detection for vacuum's tuple removal
actions, which we don't have for the index hints.

 Tom> 2. Although _bt_killitems doesn't WAL-log its setting of kill
 Tom> bits, those bits could propagate to the standby anyway, as a
 Tom> result of a subsequent WAL action on the index page that gets a
 Tom> full-page image added.

That's OK as long as we're ignoring those hints on the standby.

 Tom> I believe that in some replication modes, #1 isn't a problem
 Tom> because we have mechanisms to hold back the primary's global xmin.

That's the case if feedback is enabled, but not if it's disabled, which
is sometimes done intentionally to ensure that long-running queries on
the standby don't hold back the master's xmin horizon.

Conflict detection then comes into play to kill the aforesaid
long-running queries before vacuuming away anything they might see.

-- 
Andrew (irc:RhodiumToad)

Reply via email to