On Sat, Jan 14, 2012 at 12:42:02AM -0500, Robert Haas wrote: > On Fri, Jan 13, 2012 at 8:02 PM, Noah Misch <n...@leadboat.com> wrote: > > Simon spoke to the FPI side of the question. ?For heap pages, the > > XLOG_HEAP_NEWPAGE consumers are CLUSTER, VACUUM FULL and ALTER TABLE SET > > TABLESPACE. ?For the last, we will have already logged any PD_ALL_VISIBLE > > bits > > through normal channels. ?CLUSTER and VACUUM FULL never set PD_ALL_VISIBLE > > or > > visibility map bits. ?When, someday, they do, we might emit a separate WAL > > record to force the recovery conflict. ?However, CLUSTER/VACUUM FULL already > > remove tuples still-visible to standby snapshots without provoking a > > recovery > > conflict. ?(Again only with hot_standby_feedback=off.) > > Is the big about CLUSTER/VACUUM FULL a preexisting bug? If not, why not?
I suspect it goes back to 9.0, yes. I'm on the fence regarding whether to call it a bug or an unimplemented feature. In any case, +1 for improving it. > Other than that, it seems like we might be converging on a workable > solution: if hot_standby_feedback=off, disable index-only scans for > snapshots taken during recovery; if hot_standby_feedback=on, generate > recovery conflicts when a snapshot's xmin precedes the youngest xmin > on a page marked all-visible, but allow the use of index-only scans, > and allow sequential scans to trust PD_ALL_VISIBLE. Off the top of my > head, I don't see a hole in that logic... I wouldn't check hot_standby_feedback. Rather, mirror what we do for XLOG_HEAP2_CLEAN. Unconditionally add an xid to xl_heap_visible bearing the youngest xmin on the page (alternately, some convenient upper bound thereof). Have heap_xlog_visible() call ResolveRecoveryConflictWithSnapshot() on that xid. Now, unconditionally trust PD_ALL_VISIBLE and permit index-only scans. The user's settings of hot_standby_feedback, vacuum_defer_cleanup_age, max_standby_streaming_delay and max_standby_archive_delay will drive the consequential trade-off: nothing, query cancellation, or recovery delay. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers