On Fri, Apr 1, 2022 at 2:04 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > The v1-0003 patch introduced smgropen_cond() to avoid the problem of > IssuePendingWritebacks(), which does desynchronised smgropen() calls > and could open files after the barrier but just before they are > unlinked. Makes sense, but... > > 1. For that to actually work, we'd better call smgrcloseall() when > PROCSIGNAL_BARRIER_SMGRRELEASE is handled; currently it only calls > closeAllVds(). Here's a patch for that. But...
What if we instead changed our approach to fixing IssuePendingWritebacks()? Design sketch: 1. We add a new flag, bool smgr_zonk, to SmgrRelationData. Initially it's false. 2. Receipt of PROCSIGNAL_BARRIER_SMGRRELEASE sets smgr_zonk to true. 3. If you want, you can set it back to false any time you access the smgr while holding a relation lock. 4. IssuePendingWritebacks() uses smgropen_cond(), as today, but that function now refuses either to create a new smgr or to return one where smgr_zonk is true. I think this would be sufficient to guarantee that we never try to write back to a relfilenode if we haven't had a lock on it since the last PROCSIGNAL_BARRIER_SMGRRELEASE, which I think is the invariant we need here? My thinking here is that it's a lot easier to get away with marking the content of a data structure invalid than it is to get away with invalidating pointers to that data structure. If we can tinker with the design so that the SMgrRelationData doesn't actually go away but just gets a little bit more thoroughly invalidated, we could avoid having to audit the entire code base for code that keeps smgr handles around longer than would be possible with the design you demonstrate here. -- Robert Haas EDB: http://www.enterprisedb.com