On Mon, Jan 30, 2017 at 7:20 PM, Pavan Deolasee <pavan.deola...@gmail.com>
> Based on my investigation so far and the evidence that I've collected,
> what probably happens is that after a relcache invalidation arrives at the
> new transaction, it recomputes the rd_indexattr but based on the old,
> cached rd_indexlist. So the rd_indexattr does not include the new columns.
> Later rd_indexlist is updated, but the rd_indexattr remains what it was.
I was kinda puzzled this report did not get any attention, though it's
clearly a data corruption issue. Anyways, now I know why this is happening
and can reproduce this very easily via debugger and two sessions.
The offending code is RelationGetIndexAttrBitmap(). Here is the sequence of
Taking the same table as in the report:
CREATE TABLE cic_tab_accounts (
aid bigint UNIQUE,
1. Session S1 calls DROP INDEX pgb_a_aid1 and completes
2. Session S2 is in process of rebuilding its rd_indexattr bitmap (because
of previous relcache invalidation caused by DROP INDEX). To be premise,
assume that the session has finished rebuilding its index list. Since DROP
INDEX has just happened,
4793 indexoidlist = RelationGetIndexList(relation);
3. S1 then starts CREATE INDEX CONCURRENTLY pgb_a_aid1 ON
4. S1 creates catalog entry for the new index, commits phase 1 transaction
and builds MVCC snapshot for the second phase and also finishes the initial
5. S2 now proceeds. Now notice that S2 had computed the index list based on
the old view i.e. just one index.
The following comments in relcahce.c are now significant:
4800 * Copy the rd_pkindex and rd_replidindex value computed by
4801 * RelationGetIndexList before proceeding. This is needed because
4802 * relcache flush could occur inside index_open below, resetting
4803 * fields managed by RelationGetIndexList. (The values we're
4804 * will still be valid, assuming that caller has a sufficient lock
4805 * the relation.)
That's what precisely goes wrong.
6. When index_open is called, relcache invalidation generated by the first
transaction commit in CIC gets accepted and processed. As part of this
invalidation, rd_indexvalid is set to 0, which invalidates rd_indexlist too
7. But S2 is currently using index list obtained at step 2 above (which has
only old index). It goes ahead and computed rd_indexattr based on just the
8. While relcahce invalidation processing at step 6 cleared rd_indexattr
(along with rd_indexvalid), it is again set at
4906 relation->rd_indexattr = bms_copy(indexattrs);
But what gets set at step 8 is based on the old view. There is no way
rd_indexattr will be invalidated until S2 receives another relcache
invalidation. This will happen when the CIC proceeds. But until then, every
update done by S2 will generate broken HOT chains, leading to data
I can reproduce this entire scenario using gdb sessions. This also explains
why the patch I sent earlier helps to solve the problem.
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services