Please see the HOT version 6.3 patch posted on pgsql-patches. I've implemented support for CREATE INDEX and CREATE INDEX CONCURRENTLY based on the recent discussions. The implementation is not yet complete and needs some more testing/work/discussion before we can start considering it for review.
One of the regression test case fails because CIC now works in three phases. In the first phase, we just create the catalog entry for the index and commit the transaction. If the index_build fails because of any error (say, unique key constraint) the index creation fails, but the catalog entry remains. CREATE INDEX: ----------------- The implementation is based on having an extra attribute in pg_index to track the transaction xid which created the index and then use that information to decide whether the newly created index should be used in a query or not. Here are couple of TODO items: Plan Invalidation: We decided to store transaction id of the top level transaction in the cached plan if one or more potentially useful indexes are not available while planning a query. And then replan if the current transaction id is different that the one stored with the plan. I'm not very well familiar with this code, so any suggestions how to do it in a clean way ? Making index available in the creating transaction: This is an important TODO item. We would like to make the index immediately available to the transaction which created it, if the transaction is running in read-committed mode. If the transaction is running in SERIALIZABLE mode, then we can't do much because we might have skipped one or more RECENTLY_DEAD tuples while building the index and hence index can not be used. The way we build index now is that we only index the tuple at the head of the HOT-chain. So there could be DELETE_IN_PROGRESS tuples (updated/deleted by the transaction which is creating the index) which we skipped while building the index. My question is, is there a case where this transaction may use the new index and still see those tuples ? I know that the DELETE_IN_PROGRESS tuples are visible if there are any open cursors. But then plans for these open cursors can not be changed until they are closed and reopened, isn't it ? Tom mentioned about recursive plpgsql functions where the outer instance can use an older snapshot. I tried that but could not produce a scenario where the outer instance could see the DELETE_IN_PROGRESS tuple if the tuple is updated in the inner instance. Can someone help me with an example where a read-committed transaction would use the newly created index and still see the DELETE_IN_PROGRESS tuple ? CREATE INDEX CONCURRENTLY: ------------------------------ One of the item which needs review and discussion is the handling on unique key checks while creating the index concurrently. We build the index in three phases. In the first phase, we just create the catalog entry and mark index invalid for inserts. This ensures that transactions started after that won't create HOT-chains that break the HOT property for the new index. In the second phase, we build the index by applying the reference snapshot to the heap tuples. In the third phase, we validate the index and insert any missing entries. In this phase, we only insert if index entry for the root tuple is missing. So there is just one insert operation which covers all the tuples in the HOT-chain. In order to check unique key violations, inside _bt_check_unique() function when a duplicate key is found, we follow the entire HOT-chain and check if any tuple in the chain is live. If so, unique key violation constraint is raised. IOW if any two HOT-chains share the same key and have one live tuple, unique key constraint is considered violated. Can anyone spot a hole in this logic ? Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com