On Thu, Sep 26, 2013 at 8:56 PM, Andres Freund <and...@2ndquadrant.com> wrote: > On 2013-09-26 20:47:33 +0900, Michael Paquier wrote: >> On Thu, Sep 26, 2013 at 8:43 PM, Andres Freund <and...@2ndquadrant.com> >> wrote: >> > On 2013-09-26 20:40:40 +0900, Michael Paquier wrote: >> >> On Thu, Sep 26, 2013 at 7:34 PM, Andres Freund <and...@2ndquadrant.com> >> >> wrote: >> >> > On 2013-09-26 12:13:30 +0900, Michael Paquier wrote: >> >> >> > 2) I don't think the drop algorithm used now is correct. Your >> >> >> > index_concurrent_set_dead() sets both indisvalid = false and >> >> >> > indislive = >> >> >> > false at the same time. It does so after doing a >> >> >> > WaitForVirtualLocks() - >> >> >> > but that's not sufficient. Between waiting and setting indisvalid = >> >> >> > false another transaction could start which then would start using >> >> >> > that >> >> >> > index. Which will not get updated anymore by other concurrent >> >> >> > backends >> >> >> > because of inislive = false. >> >> >> > You really need to follow index_drop's lead here and first unset >> >> >> > indisvalid then wait till nobody can use the index for querying >> >> >> > anymore >> >> >> > and only then unset indislive. >> >> > >> >> >> Sorry, I do not follow you here. index_concurrent_set_dead calls >> >> >> index_set_state_flags that sets indislive and *indisready* to false, >> >> >> not indisvalid. The concurrent index never uses indisvalid = true so >> >> >> it can never be called by another backend for a read query. The drop >> >> >> algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw. >> >> > >> >> > That makes it even worse... You can do the concurrent drop only in the >> >> > following steps: >> >> > 1) set indisvalid = false, no future relcache lookups will have it as >> >> > valid >> > >> >> indisvalid is never set to true for the concurrent index. Swap is done >> >> with concurrent index having indisvalid = false and former index with >> >> indisvalid = true. The concurrent index is validated with >> >> index_validate in a transaction before swap transaction. >> > >> > Yes. I've described how it *has* to be done, not how it's done. >> > >> > The current method of going straight to isready = false for the original >> > index will result in wrong results because it's not updated anymore >> > while it's still being used. > >> The index being dropped at the end of process is not the former index, >> but the concurrent index. The index used after REINDEX CONCURRENTLY is >> the old index but with the new relfilenode. > > That's not relevant unless I miss something. > > After phase 4 both indexes are valid (although only the old one is > flagged as such), but due to the switching of the relfilenodes backends > could have either of both open, depending on the time they built the > relcache entry. Right? > Then you go ahead and mark the old index - which still might be used! - > as dead in phase 5. Which means other backends might (again, depending > on the time they have built the relcache entry) not update it > anymore. In read committed we very well might go ahead and use the index > with the same plan as before, but with a new snapshot. Which now will > miss entries. In this case, doing a call to WaitForOldSnapshots after the swap phase is enough. It was included in past versions of the patch but removed in the last 2 versions.
Btw, taking the problem from another viewpoint... This feature has now 3 patches, the 2 first patches doing only code refactoring. Could it be possible to have a look at those ones first? Straight-forward things should go first, simplifying the core feature evaluation. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers