"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> CREATE INDEX > CREATE INDEX CONCURRENTLY > ------------------------- > I'm not very familiar with how these, so I'll just shut up.. Here is an addendum for how HOT handles CREATE INDEX and CREATE INDEX CONCURRENTLY.
CREATE INDEX There are two strategies for CREATE INDEX depending on whether we're doing it concurrently or not. For a regular locking create index we take the two precautions: 1) When indexing a table we must take care that the index contains pointers only to Root tuples. These are the non-HOT tuples which everyone else expects to have index pointers. However the key value we want to index is actually the key at the *end* of the chain, ie, the most recent tuple on the index chain. Note that this means we could create broken HOT chains because the index key of the new index could have changed in the HOT chain. If we find any HOT-updated tuple with is RECENTLY_DEAD or DELETE_IN_PROGRESS we ignore it assuming that we'll also come across the *end* of the update chain and index that instead. 2) If we ignore any HOT-updated tuples above then we mark the index as unusable by older transactions for which those tuples might be visible. This ensures that nobody for whom the older tuples in a broken HOT chain are visible will use the index. We do this by putting our transaction-id in pg_index.indcreatexid after building the index. Queries check whether that indcreatexid is in their serializable snapshot, if it isn't then the index is not usable for that query. This means that transactions started before the create index commits will not get the benefit of the new index even if they first scan the table only after the index exists. However new transactions get the benefit of the new index immediately but they will always follow the HOT update chain since the old tuples with the possibly different keys will never be visible to them. The tricky case arises with queries executed in the same transaction as the CREATE INDEX. In the case of a new table created within the same transaction such as with pg_dump -1 the index will always be usable because there will never be any HOT update chains so the indcreatexid will never be set. Also in the case of a read-committed transaction new queries will always be able to use the index. Only in the case of a serializable transaction building an index on an existing table with HOT updates in it will the index go unused. CREATE INDEX CONCURRENTLY In the concurrent case we take a different approach. 1) we create the pg_index entry immediately, before we scan the table. The pg_index is marked as "not ready for inserts". Then we commit and wait for any transactions which have the table open to finish. This ensures that no *new* HOT updates will change the key value for our new index. 2) We wait out any transactions which had the table open. Then we build the index with a snapshot. Because we waited out anyone who might have been around before the index was created any tuples seen in the snapshot will have only valid HOT chains following them. They may still have older HOT updates behind them which are broken. As above we point the index pointers at the Root of the HOT-update chain but we use the key from the end of the chain. 4) We mark the index open for inserts (but still not ready for reads) then we again wait out any transactions which had the table open. Then we take a second reference snapshot and validate the index. This searches for tuples missing from the index -- but it again has to look bup the root of the HOT chains and search for those tuples in the index. 4) Then we wait until *every* transaction in progress in the validate_index reference snapshot is finished. This ensures that nobody is alive any longer who could possibly see any of the tuples in a broken HOT chain. Glossary: Broken HOT Chain . A HOT chain in which the key value for an index has changed. This is not allowed to occur normally but if a new index is created it can happen. In that case various strategies are used to ensure that no transaction for which the older tuples are visible can use the index.
-- Gregory Stark EnterpriseDB http://www.enterprisedb.com
---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend