"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:

> -------------------------
> I'm not very familiar with how these, so I'll just shut up..

Here is an addendum for how HOT handles CREATE INDEX and CREATE INDEX


There are two strategies for CREATE INDEX depending on whether we're doing it
concurrently or not. For a regular locking create index we take the two

1) When indexing a table we must take care that the index contains pointers
   only to Root tuples. These are the non-HOT tuples which everyone else
   expects to have index pointers.

   However the key value we want to index is actually the key at the *end* of
   the chain, ie, the most recent tuple on the index chain. Note that this
   means we could create broken HOT chains because the index key of the new
   index could have changed in the HOT chain.

   If we find any HOT-updated tuple with is RECENTLY_DEAD or
   DELETE_IN_PROGRESS we ignore it assuming that we'll also come across the
   *end* of the update chain and index that instead.

2) If we ignore any HOT-updated tuples above then we mark the index as
   unusable by older transactions for which those tuples might be visible.
   This ensures that nobody for whom the older tuples in a broken HOT chain
   are visible will use the index.

   We do this by putting our transaction-id in pg_index.indcreatexid after
   building the index. Queries check whether that indcreatexid is in their
   serializable snapshot, if it isn't then the index is not usable for that

   This means that transactions started before the create index commits will
   not get the benefit of the new index even if they first scan the table
   only after the index exists. However new transactions get the benefit of
   the new index immediately but they will always follow the HOT update
   chain since the old tuples with the possibly different keys will never be
   visible to them.

   The tricky case arises with queries executed in the same transaction as
   the CREATE INDEX. In the case of a new table created within the same
   transaction such as with pg_dump -1 the index will always be usable
   because there will never be any HOT update chains so the indcreatexid
   will never be set. Also in the case of a read-committed transaction new
   queries will always be able to use the index. Only in the case of a
   serializable transaction building an index on an existing table with HOT
   updates in it will the index go unused.


In the concurrent case we take a different approach.

1) we create the pg_index entry immediately, before we scan the table. The
   pg_index is marked as "not ready for inserts". Then we commit and wait for
   any transactions which have the table open to finish. This ensures that no
   *new* HOT updates will change the key value for our new index.

2) We wait out any transactions which had the table open. Then we build the
   index with a snapshot. Because we waited out anyone who might have been
   around before the index was created any tuples seen in the snapshot will
   have only valid HOT chains following them. They may still have older HOT
   updates behind them which are broken. As above we point the index pointers
   at the Root of the HOT-update chain but we use the key from the end of the

4) We mark the index open for inserts (but still not ready for reads) then we
   again wait out any transactions which had the table open. Then we take a
   second reference snapshot and validate the index. This searches for tuples
   missing from the index -- but it again has to look bup the root of the HOT
   chains and search for those tuples in the index.

4) Then we wait until *every* transaction in progress in the validate_index
   reference snapshot is finished. This ensures that nobody is alive any
   longer who could possibly see any of the tuples in a broken HOT chain.


Broken HOT Chain
 .     A HOT chain in which the key value for an index has changed. This is
 not allowed to occur normally but if a new index is created it can happen. In
 that case various strategies are used to ensure that no transaction for which
 the older tuples are visible can use the index.
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to