Oooh ... I can't be sure that this is what's biting you, but I definitely see a bug that seems to match the symptoms. As the comments in index.c point out, CREATE INDEX CONCURRENTLY works like this:
* validate_index() works by first gathering all the TIDs currently in the * index, using a bulkdelete callback that just stores the TIDs and doesn't * ever say "delete it". (This should be faster than a plain indexscan; * also, not all index AMs support full-index indexscan.) Then we sort the * TIDs, and finally scan the table doing a "merge join" against the TID list * to see which tuples are missing from the index. The scan is done using the regular heapscan code, which in 8.3 has been modified to enable "synchronized scanning", which means it might start from the middle of the table and wrap around. If that happens, the "merge join" will get totally confused because it is expecting the tuples to be returned in increasing ctid order. This will result in misidentifying a bunch of TIDs as not being in the table, allowing duplicate entries to be made in the index. And both of the misbehaviors you originally showed can be explained by duplicate index entries (actual or attempted). Furthermore, the first duplicate TIDs to be entered will tend to be low-numbered TIDs, which explains why you were consistently getting GIN complaints about low-numbered TIDs, which I was having a hard time thinking of a mechanism for otherwise. I can now reproduce the failure: the trick is to get the syncscan start pointer to not be on page zero. For example, -- load up table begin; declare c cursor for select id from test; fetch 10000 from c; commit; CREATE INDEX CONCURRENTLY cluster_test ON public.test USING gin (tsv); ERROR: item pointer (0,1) already exists I think it's okay for CREATE INDEX CONCURRENTLY to use bulk-read access strategy (that is, seqscan using a limited number of buffers), but it has to be able to force the scan to start at page zero. Right now, heapam.c doesn't offer any API to control this, but we can certainly add one. I wonder whether there are any other places that are silently assuming that heapscans start from page zero ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings