There is one more (known) stop-ship problem in SPGiST, which I'd kind of like to get out of the way now before I let my knowledge of that code get swapped out again. This is that SPGiST is unsafe for use by hot standby slaves.
The problem comes from "redirect" tuples, which are short-lifespan objects that replace a tuple that's been moved to another page. A redirect tuple can be recycled as soon as no active indexscan could be "in flight" from the parent index page to the moved tuple. SPGiST implements this by marking each redirect tuple with the XID of the creating transaction, and assuming that the tuple can be recycled once that XID is below the OldestXmin horizon (implying that all active transactions started after it ended). This is fine as far as transactions on the master are concerned, but there is no guarantee that the recycling WAL record couldn't be replayed on a hot standby slave while there are still HS transactions that saw the old state of the parent index tuple. Now, btree has a very similar problem with deciding when it's safe to recycle a deleted index page: it has to wait out transactions that could be in flight to the page, and it does that by marking deleted pages with XIDs. I see that the problem has been patched for btree by emitting a special WAL record just before a page is recycled. However, I'm a bit nervous about copying that solution, because the details are a bit different. In particular, I see that btree marks deleted pages with ReadNewTransactionId() --- that is, the next-to-be-assigned XID --- rather than the XID of the originating transaction, and then it subtracts one from the XID before sending it to the WAL stream. The comments about this are not clear enough for me, and so I'm wondering whether it's okay to use the originating transaction XID in a similar way, or if we need to modify SPGiST's rule for how to mark redirection tuples. I think that the use of ReadNewTransactionId is because btree page deletion happens in VACUUM, which does not have its own XID; this is unlike the situation for SPGiST where creation of redirects is caused by index tuple insertion, so there is a surrounding transaction with a real XID. But it's not clear to me how GetConflictingVirtualXIDs makes use of the limitXmin and whether a live XID is okay to pass to it, or whether we actually need "next XID - 1". Info appreciated. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers