On 3/22/07, Tom Lane <[EMAIL PROTECTED]> wrote:
"Pavan Deolasee" <[EMAIL PROTECTED]> writes:
> When CREATE INDEX starts, it acquires ShareLock on the table.
> At this point we may have one or more HOT-update chains in the
> table. Tuples in this chain may be visible to one or more running
> transactions. The fact that we have ShareLock on the table means
> that all tuples in the chain except the one at the head either
> RECENTLY_DEAD or were UPDATEd by the same transaction
> that is now running CREATE INDEX.
This logic seems pretty questionable to me in view of the recent VACUUM
FULL bug fixes. We now know that an update chain can contain tuples that
appear DEAD but are later than ones that are RECENTLY_DEAD. How are
you defining a HOT chain exactly --- will it be just a part of an
update chain that consists only of contiguous tuples that appear live or
RECENTLY_DEAD to the CREATE INDEX transaction?
HOT-chain is something which contains contiguous tuples that share
the same index keys for all the existing indexes and share a single
index entry. It does not really matter if there are RECENTLY_DEAD
tuples that appear before a DEAD tuple because from VACUUM FULL
bug fix we know that they are DEAD too.
My argument is that its enough to index only the LIVE tuple which
is at the end of the chain if we don't use the new index for queries
in transactions which were started before CREATE INDEX. I am
proposing to do that by storing an xid in the pg_index row. A
special case is where a tuple is UPDATEd multiple times by
the same transaction which is also creating the index, in which case
there are more than one LIVE versions of the tuple. But again
we are safe by indexing only the latest version because all other
versions would be invisible (even to us) once CREATE INDEX commits.
In fact, the serializable transactions started before CREATE INDEX
> can not anyway see the index so all this is done to handle
> read-committed transactions.
You are laboring under an illusion that system catalog accesses are MVCC.
SnapshotNow does not behave that way: the system can see the new index
as soon as it's committed. (It had better, since it has to start
updating the index immediately, whether it's safe to scan it or not.)
I'm not sure whether that's fundamental to your argument or not, but
it's certainly wrong.
Oh, thanks for pointing that out. But thats certainly not fundamental
to the argument as you probably already guessed. The xid still controls
the usage of index for query planning, somewhat similar to "isindvalid"
flag for CREATE INDEX CONCURRENTLY.