On 2/9/07, Simon Riggs <[EMAIL PROTECTED]> wrote:
On Wed, 2007-02-07 at 14:17 -0500, Tom Lane wrote: > ISTM we could fix that by extending the index VACUUM interface to > include two concepts: aside from "remove these TIDs when you find them", > there could be "replace these TIDs with those TIDs when you find them". > This would allow pointer-swinging to one of the child tuples, after > which the old root could be removed. This has got the same atomicity > problem as for CREATE INDEX, because it's the same thing: you're > de-HOT-ifying the child. So if you can solve the former, I think you > can make this work too. This is looking like the best option out of the many, since it doesn't have any serious restrictions or penalties. Let's see what Pavan thinks, since he's been working on this aspect.
ISTM that there two related issues that we need to solve to make progress. - We must make de-HOTifying or CHILLing crash safe - Concurrent index scans should work correctly with CHILLing operations I think the first issue can be addressed on the lines of what Heikki suggested. We can CHILL one tuple at a time. I am thinking of a two step process. In the first step, the root-tuple and the heap-only tuple (which needs CHILLing) are marked with a special flag, CHILL_IN_PROGRESS. This operation is WAL logged. We then insert appropriate index entries for the tuple under consideration. In the second step, the HEAP_UPDATED_ROOT and HEAP_ONLY_TUPLE flags on the heap tuples are adjusted and CHILL_IN_PROGRESS flags are cleared. During normal operations, if CHILL_IN_PROGRESS flag is found set, we might need to do some more work to figure out whether the index insert operations were successful or not. If we find that there are missing index entries for the tuple under consideration for CHILLing, then those could be added now and flags are set/reset appropriately. The second problem of concurrent index scans seems a bit more complex. We need a mechanism so that no tuples are missed or tuples are not returned twice. Since CHILLing of a tuple adds a new access path to the tuple from the index, a concurrent index scan may return a tuple twice. How about grabbing a AccessExclusiveLock during CHILLing operation ? This would prevent any concurrent index scans. Since CHILLing of a large table can take a long time, the operation can be spread across time with periodic acquire/release of the lock. This would prevent starvation of other backends. Since CHILLing is required only for CREATE INDEX and stub-cleanup, I am assuming that its ok for it to be lazy in nature. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com