Hi Tom, On 2017-09-29 20:26:42 +0000, Tom Lane wrote: > get_rel_oids used to not take any relation locks at all, but that stopped > being a good idea with commit 3c3bb9933, which inserted a syscache lookup > into the function. A concurrent DROP TABLE could now produce "cache lookup > failed", which we don't want to have happen in normal operation. The best > solution seems to be to transiently take a lock on the relation named by > the RangeVar (which also makes the result of RangeVarGetRelid a lot less > spongy). But we shouldn't hold the lock beyond this function, because we > don't want VACUUM to lock more than one table at a time. (That would not > be a big problem right now, but it will become one after the pending > feature patch to allow multiple tables to be named in VACUUM.)
I'm not sure how big a problem this is, but I suspect it is one. Autovacuum used to skip relations when they're locked, and the vacuum isn't an anti-wraparound one. But after this change it appears we'll get stuck behind this new lock, even if VACOPT_NOWAIT is specified. That's bad because now relations that are AEL locked (e.g. because of some longrunning DDL) can block global autovacuum progress. I noticed this when reviewing a patch that adds NOWAIT (now renamed to SKIP LOCKED) as a user visible knob and it getting stuck behind an AEL. The updated version of the patch http://archives.postgresql.org/message-id/7327B413-1A57-477F-A6A0-6FD80CB5482D%40amazon.com has an attempt at fixing the issue, although I've not yet looked at it in any sort of detail. I suspect we should break out that part once reviewed, and backpatch to 10? Greetings, Andres Freund