Hello again. Andres, Peter, thanks for your comments.
Some of issues your mentioned (reporting feedback to the another cascade standby, processing queries after restart and newer xid already reported) could be fixed in provided design, but your intention to have "independent correctness backstop" is a right thing to do. So, I was thinking about another approach which is: * still not too tricky to implement * easy to understand * does not rely on hot_standby_feedback for correctness, but only for efficiency * could be used with any kind of index * does not generate a lot of WAL Let's add a new type of WAL record like "some index killed tuple hint bits are set according to RecentGlobalXmin=x" (without specifying page or even relation). Let's call 'x' as 'LastKilledIndexTuplesXmin' and track it in standby memory. It is sent only in case of wal_log_hints=true. If hints cause FPW - it is sent before FPW record. Also, it is not required to write such WAL every time primary marks index tuple as dead. It should be done only in case 'LastKilledIndexTuplesXmin' is changed (moved forward). On standby such record is used to cancel queries. If transaction is executed with "ignore_killed_tuples==true" (set on snapshot creation) and its xid is less than received LastKilledIndexTuplesXmin - just cancel the query (because it could rely on invalid hint bit). So, technically it should be correct to use hints received from master to skip tuples according to MVCC, but "the conflict rate goes through the roof". To avoid any real conflicts standby sets ignore_killed_tuples = (hot_standby_feedback is on) AND (wal_log_hints is on on primary) AND (standby new snapshot xid >= last LastKilledIndexTuplesXmin received) AND (hot_standby_feedback is reported directly to master). So, hot_standby_feedback loop effectively eliminates any conflicts (because LastKilledIndexTuplesXmin is technically RecentGlobalXmin in such case). But if feedback is broken for some reason - query cancellation logic will keep everything safe. For correctness LastKilledIndexTuplesXmin (and as consequence RecentGlobalXmin) should be moved only forward. To set killed bits on standby we should check tuples visibility according to last LastKilledIndexTuplesXmin received. It is just like master sets these bits according to its state - so it is even safe to transfer them to another standby. Does it look better now? Thanks, Michail.