There are few things I realized over the weekend while going
through the code:
1. It looks like a bad idea to use ALTER TABLE .. to chill a table
becuase ALTER TABLE takes AccessExclusive lock on the table.
But it would still be a good idea to have ALTER TABLE .. to turn
2. May be I was too paranoid about deadlocks. CREATE INDEX upgrade
its lock anyways and is prone to deadlock. So as long as we don't
create new deadlock scenarios, we should be fine.
3. CREATE INDEX CONCURRENTLY must be run as a seperate transaction.
So its should be acceptable if we run CHILL as a seperate transaction.
4. CREATE INDEX CONCURRENTLY runs with ShareUpdateExclusiveLock and
thus locks out concurrent CREATE INDEX [CONCURRENTLY], VACUUM and
VACUUM FULL. We can do the same for CHILL to avoid any unnecessary
race conditions between all of these.
So here is my plan. Let me know your suggestions/comments/objections.
Changes to pg_class and new DDLs:
We add two boolean attributes to pg_class: hot_update and hot_fetch.
We introduce two DDLs to turn HOT on/off. Note that the DDL
itself does not CHILL the table, but only affects the
postgres=# ALTER TABLE <tblname> ENABLE HOT;
postgres=# ALTER TABLE <tblname> DISABLE HOT;
These DDLs would acquire AccessExclusive lock on the table and
set hot_update to true/false using simple_heap_update()
CREATE INDEX [CONCURRENTLY]:
If a HEAP_ONLY tuple is found, error out with a HINT to run
CHILL on the table and then retry.
If CHILL_IN_PROGRESS tuple is found, VACUUM is required on
We introduce a new command to chill a table. The syntax for the
same could be:
postgres=# CHILL [VERBOSE] <tblname>;
UPDATE/INSERT/SELECT would work while the table is being chilled.
But CREATE INDEX [CONCURRENTLY], VACUUM [FULL] and another CHILL
would be locked out. As a side-effect, HOT-updates are turned off on the
table and explicit ALTER TABLE ENABLE HOT is required to turn
HOT-updates on again.
Here is the algoirthm to CHILL table.
1. Check if CHILL is running inside a transaction block, error
out if so.
2. Start a new transaction
3. Acquire ShareUpdateExclusiveLock on the relation. This would
allow concurrent SELECT/INSERT/UPDATE, but lock out concurrent
CHILL, VACUUM [FULL] and CREATE INDEX [CONCURRENTLY]
4. Set hot_update to false and update pg_class using
5. Acquire ShareUpdateExclusiveLock for the entire session.
6. Commit the transaction
7. Start a new transaction
8. Wait for all transactions in the current snapshot to finish.
This would ensure that there are no HOT-updates possible further
9. Start the first heap scan, CHILL HEAP_ONLY tuples by inserting
appropriate index entries and setting CHILL_IN_PROGRESS flag.
WAL log the operation
10. If a CHILL_IN_PROGRESS is already set on the tuple, we can't
be sure whether the corresponding index entry already exists
or not. One option is to error out and force VACUUM on the table.
Alternatively, the index_insert can be enhanced to check if a
the same entry already exists.
11. When the entire heap is chilled, set hot_fetch to false
and update pg_class using simple_heap_update()
12. Commit the transaction
13. Start a new transaction
14. Wait for all transactions in the current snapshot to finish.
This would ensure that all the subsequent index scans would
only use direct path from the index.
15. Start second heap scan. Reset CHILL_IN_PROGRESS, HEAP_ONLY
and HOT_UPDATED flags. WAL log the operations. We may not need
this, but we can revisit this later to optimize WAL logging.
16. When the second scan is complete, set hot_fetch to
true and update pg_class using simple_heap_update(). There are
no HOT_UPDATED tuples in the heap at this moment, but we should
reset the state neverthless.
17. Commit the transaction.
If the CHILL command crashes before completing the operation,
we might be left with hot_update/hot_fetch turned OFF. Administrative
command is needed to turn them ON again. But there won't be any
correctness problems in the meantime.
The uncleaned tuples left with CHILL_IN_PROGRESS flags would
require VACUUM for cleanup.
If hot_fetch is true, we ignore direct paths from the
index to HEAP_ONLY tuples
If hot_fetch is false, we ignore HOT_UPDATED flags
and only use direct paths from the index.
If a CHILL_IN_PROGRESS flag found, collect that tuple for
index removal irrespective of whether the tuple is DEAD
or not. If the tuple is not DEAD, only the CHILL_IN_PROGRESS
flag is reset in the second pass.
Is the plan acceptable ? If there are no objections to the
algorithms or the behavior in general, I would start working
on this with a target of feature freeze.
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not