On Mon, 2006-04-03 at 09:55 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > Thinking about this some more, I ask myself: why is it we log index > > inserts at all? We log heap inserts, which contain all the information > > we need to replay all index inserts also, so why bother?
> I don't see any workable half measures. Yep, looks that way. > (1) We can't run user-defined functions during log replay. Quite > aside from any risk of nondeterminism, the normal transaction > infrastructure isn't functioning in that environment. Didn't think of that one, but we could special case it. > (2) Some of the index code is itself deliberately nondeterministic. > I'm thinking in particular of the move-right-or-not choice in > _bt_insertonpg() when there are many equal keys, but randomization is > in general a useful algorithmic technique that we'd have to forswear. Understood. > (3) In the presence of concurrency, the sequence of heap-insert WAL > records isn't enough info, because it doesn't tell you what order the > index inserts occurred in. The btree code, at least, is sufficiently > concurrent that even knowing the sequence of leaf-key insertions isn't > full information --- it's not hard to imagine cases where decisions > about where to split upper-level pages are dependent on which process > manages to obtain lock on a page first. In the presence of concurrency, it could be the OS who decides who gets first crack at a page because of scheduling. We can never assume anything there, so definitely a killer argument. But that does open up an opportunity if we locked the table AccessExclusive for a COPY LOCK into an existing table with indexes. One day maybe, but not interesting enough yet. Best Regards, Simon Riggs ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster