Gregory Stark <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> writes: >> Unless someone's got a brilliant idea, my recommendation at this point >> is that we restrict the patch to building only non-unique indexes.
> I assume you'll add the check? Yeah, I'll take care of it. At the moment it may be moot, because I've realized that validate_index doesn't work anyway. It is scanning the index and then assuming that any tuple inserted into the index subsequent to that scan will still be INSERT_IN_PROGRESS when the heapscan reaches it. That's completely bogus of course --- the tuple could be committed live or even already deleted by the time the heapscan reaches it. This would result in making duplicate index entries, which is unacceptable even in a nonunique index (it'd cause indexscans to return the same row twice). I'm trying to work out whether we can fix this by taking an MVCC snapshot before we scan the index, and then inserting all tuples we find in the heap that are live according to the snap but are not present in the indexscan data. There are still race conditions in this but I think sufficient delay would fix them. Considerations: * If a tuple is good according to the snap, it must have been inserted by an xact that committed before the snap, therefore there is no still-in-progress index insertion happening for it. So if it's not in the sorted indexscan data we know we must insert it. If it is in the indexscan data, of course we don't have to. * If a tuple is not good according to the snap, there are three possibilities: ** It was never committed good at all. We need not index it. ** It was inserted by a transaction that committed post-snap. We assume that that transaction has or will index it. ** It was deleted by a transaction that committed pre-snap. We assume we need not index it because no remaining transaction will care about it. The trick is that we must wait long enough to ensure that those two assumptions are good. We already know about waiting long enough to ensure that all live transactions are aware of the index, which takes care of the first assumption as long as we take the snap after that wait. However, the second assumption means that we must be sure there are no other backends with serializable snapshots older than the snap we are using for this. I think this means that we wait for all xacts to be aware of our index, take the reference snap, then wait for all xacts except ours to die --- without exception --- and then we can get on with the second phase of the work. It might be OK to do the indexscan and sort before we do the second wait, though, so we could get at least something done. Comments? Have I missed any case for a tuple to fail the snap? Does this analysis change our conclusions about whether we can support unique index builds? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings