On 9/12/07, Tom Lane <[EMAIL PROTECTED]> wrote: > > Bruce Momjian <[EMAIL PROTECTED]> writes: > > Current README.HOT version: > > ftp://momjian.us/pub/postgresql/mypatches/README.HOT > > I've done some further editorializing and wordsmithing on this, > and marked with XXX some areas that seem like they're still red flags > (or at least inadequately described).
Thanks a lot. > CREATE INDEX CONCURRENTLY > ------------------------- > > In the concurrent case we take a different approach. We create the > pg_index entry immediately, before we scan the table. The pg_index entry > is marked as "not ready for inserts". Then we commit and wait for any > transactions which have the table open to finish. This ensures that no > *new* HOT updates will change the key value for our new index, because all > transactions will see the existence of the index and will respect its > constraint on which updates can be HOT. > > After waiting for transactions which had the table open, we build the > index with a snapshot. Because we waited for anyone who existed before > the index was created, any tuples seen in the snapshot will have only > valid forward-growing HOT chains. (They might still have older HOT > updates behind them which are broken.) As above, we point the index > pointer at the root of the HOT-update chain but we use the key from the > end of the chain. > > We mark the index open for inserts (but still not ready for reads) then > we again wait for transactions which have the table open. Then we take > a second reference snapshot and validate the index. This searches for > tuples missing from the index --- but it again has to look up the root > of the HOT chains and search for those tuples in the index. > > Then we wait until every transaction that could have a snapshot older than > the second reference snapshot is finished. This ensures that nobody is > alive any longer who could possibly see any of the older tuples in a > broken HOT chain. (This condition is actually stronger than what is > needed to protect the broken HOT chains --- the problem we are dealing > with is tuples inserted after we took our first reference snapshot. > Those will be part of valid HOT-chains, but might not be in the index.) > > XXX the above being the case, why should we bother with all this extra > mechanism? Seems we could dispense with "not ready for inserts". > > The "not ready for inserts" logic is necessary because we have separated the steps of creating the catalog entry for the new index and actual bottom-up building of the index into separate transactions. My worry was once the catalog entry is committed, an index insert won't work with the concurrent index build. I guess I also need to explain why we need to separate the catalog entry creation and index building activities into separate transactions. The primary purpose is to handle existing broken HOT chains and avoid any new broken chains being created while we are building the index. Without this additional step, while we are building the index in the first phase using a snapshot, its possible that a broken HOT chain may get created forwarding to the tuple that we index. For example: A table with two columns a and b. Say, we already have an index on 'a' and creating a new index on 'b'. Say, there is a tuple (0, 1): (1, 11) in the table where (0, 1) in the line pointer of the tuple. This is the visible tuple when we took snapshot in the first phase and so we shall index this tuple with key 11 and TID (0, 1). But before the first phase finishes, the tuple is updated to (0, 2): (1, 22). This would be a HOT update resulting in a broken chain. The root line pointer for the new tuple is still (0, 1). CIC now waits for the updating transaction, takes a fresh snapshot and starts the second phase. In the second phase, it must index the new tuple (1, 22) using the key 22 and TID (0, 1) because thats the root line pointer for the chain. Now there is already an index entry in the new index with key 11 and TID (0, 1). Inserting another index entry with a different key and the same TID would certainly corrupt the index. We solve this issue by first creating a catalog entry for the new index and committing the transaction. We then wait out for any conflicting transactions before taking snapshot for building the index in the first phase. This arrangement guarantees that the snapshot used in the first phase would always satisfy tuples at the head of the existing HOT chains. At the same time, no new broken HOT chains are created while we are building the index because the new index (even though not yet ready) is consulted for all subsequent UPDATEs. Hope this helps. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com