Robert Haas <robertmh...@gmail.com> writes: > Wait a minute: I'm confused. What's at issue here, at least AIUI, is > taking a TOAST pointer and fetching the corresponding value. But that > must involve reading from the TOAST table, and our usual paradigm for > reading from system catalogs is (1) take AccessShareLock, (2) read the > relevant rows, (3) release AccessShareLock. If we're doing that here, > then AcceptInvalidationMessages() is already getting called. If we're > not, this seems horribly unsafe; aside from the current bug, somebody > could rewrite the table in the interim.
Yes, the TOAST fetch will call AcceptInvalidationMessages(). But (1) we are starting from a TOAST pointer that is within a now-stale syscache entry, and even if we get an sinval message telling us it's stale when we open the toast table, we're still going to try to fetch that toast OID. Worse, (2) there is no guarantee that the inval message has even been sent yet, because there is no lock keeping us from trying to use the syscache entry before the inval has been sent. After further reflection I believe that pg_statistic entries invalidated by ANALYZE are not the only problem. Consider a pg_proc row whose prosrc field is toasted (not too unlikely), and suppose that somebody executes CREATE OR REPLACE FUNCTION to update it to a different toasted value. Meanwhile, somebody else is about to use the function, and they have a copy of its pg_proc row in syscache. There is no lock that will block that second backend from fetching the toast tuples before it's seen the sinval message from the CREATE OR REPLACE FUNCTION. So the exact same type of race can occur, if the first backend gets delayed between committing and broadcasting its sinval messages, and an overeager vacuum comes along to clean up the dead toast tuples. I am currently thinking that we will have to go with Heikki's solution of detoasting any out-of-line fields before we put a tuple into syscache. That does fix the problem, so long as we hold a transaction snapshot at the instant we fetch any catalog tuple that needs this treatment, because at the time we fetch the tuple we make a SnapshotNow test that shows the tuple is good (and its dependent toast tuples are too). Even if somebody has outdated the tuple and commits immediately afterward, vacuum cannot remove the toast tuples before we fetch them, because the outdater is beyond our advertised MyProc->xmin. The risk factor comes in when we hold a syscache entry across transactions; then this guarantee is lost. (In both the original example and the pg_proc case above, the second backend is only at risk when it starts its transaction after the first one's commit, and in between VACUUM was able to compute an OldestXmin greater than the first one's XID.) I guess an alternative approach would be to discard syscache entries at transaction end if HeapTupleHasExternal is true, or equivalently mark them as to which transaction read them and not use them in later transactions. But that seems more fragile, because it would have to be tied into SnapshotResetXmin --- any time we discard our last snapshot intra-transaction, toasted syscache entries would have to be invalidated since our advertised xmin is no longer protecting them. The main concern I had about detoast before caching is the risk of circularity, ie, needing detoastable cache entries in order to figure out how to detoast. But I think it's probably okay. The current list of catalogs with toast tables is pg_attrdef pg_constraint pg_database pg_db_role_setting pg_description pg_proc pg_rewrite pg_seclabel pg_shdescription pg_statistic pg_trigger Of these, only pg_proc is even conceivably consulted during a toast table fetch, and we can be sure that functions needed for such a fetch don't have toasted entries. But we will have to be very wary of any future proposal for adding a toast table to pg_class, pg_index, etc. Detoast-before-caching also seems sufficiently safe and localized to be a back-patchable fix, whereas I'd be very scared of making any significant overhaul of the sinval mechanism in the back branches. Barring objections, I'll see about making this happen. regards, tom lane -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers