Please see the HOT version 6.3 patch posted on pgsql-patches.
I've implemented support for CREATE INDEX and CREATE INDEX
CONCURRENTLY based on the recent discussions. The implementation
is not yet complete and needs some more testing/work/discussion
before we can start considering it for review.
One of the regression test case fails because CIC now works in
three phases. In the first phase, we just create the catalog entry
for the index and commit the transaction. If the index_build fails
because of any error (say, unique key constraint) the index creation
fails, but the catalog entry remains.
The implementation is based on having an extra attribute in pg_index
to track the transaction xid which created the index and then use
that information to decide whether the newly created index should
be used in a query or not. Here are couple of TODO items:
We decided to store transaction id of the top level transaction in
the cached plan if one or more potentially useful indexes are
not available while planning a query. And then replan if the
current transaction id is different that the one stored with the
plan. I'm not very well familiar with this code, so any suggestions
how to do it in a clean way ?
Making index available in the creating transaction:
This is an important TODO item. We would like to make the
index immediately available to the transaction which created it,
if the transaction is running in read-committed mode. If the
transaction is running in SERIALIZABLE mode, then we can't do
much because we might have skipped one or more RECENTLY_DEAD
tuples while building the index and hence index can not be used.
The way we build index now is that we only index the tuple at the head
of the HOT-chain. So there could be DELETE_IN_PROGRESS
tuples (updated/deleted by the transaction which is creating the
index) which we skipped while building the index. My question
is, is there a case where this transaction may use the new index
and still see those tuples ? I know that the DELETE_IN_PROGRESS
tuples are visible if there are any open cursors. But then plans for
these open cursors can not be changed until they are closed
and reopened, isn't it ? Tom mentioned about recursive plpgsql
functions where the outer instance can use an older snapshot.
I tried that but could not produce a scenario where the outer instance
could see the DELETE_IN_PROGRESS tuple if the tuple is updated
in the inner instance. Can someone help me with an example where
a read-committed transaction would use the newly created index
and still see the DELETE_IN_PROGRESS tuple ?
CREATE INDEX CONCURRENTLY:
One of the item which needs review and discussion is the handling
on unique key checks while creating the index concurrently. We build
the index in three phases. In the first phase, we just create the catalog
entry and mark index invalid for inserts. This ensures that transactions
started after that won't create HOT-chains that break the HOT property
for the new index. In the second phase, we build the index by applying
the reference snapshot to the heap tuples. In the third phase, we
validate the index and insert any missing entries.
In this phase, we only insert if index entry for the root tuple is missing.
So there is just one insert operation which covers all the tuples in the
HOT-chain. In order to check unique key violations, inside
_bt_check_unique() function when a duplicate key is found, we follow
the entire HOT-chain and check if any tuple in the chain is live. If so,
unique key violation constraint is raised. IOW if any two HOT-chains
share the same key and have one live tuple, unique key constraint
is considered violated. Can anyone spot a hole in this logic ?