On Fri, Oct 25, 2013 at 8:14 AM, Andres Freund <and...@2ndquadrant.com> wrote:
> So, I thought about this for some more and I think I've a partial
> solution to the problem.
> The worst thing about deadlocks that occur in the above is that they
> could be the VACUUM FULL waiting for the "restart LSN" of a decoding
> slot to progress, but the restart LSN cannot progress because the slot
> is waiting for a xid/transaction to end which is being blocked by the
> lock upgrade from VACUUM FULL. Such conflicts are not visible to the
> deadlock detector, which obviously is bad.
> I've prototyped this (~25 lines) and this happens pretty frequently. But
> it turns out that we can actually fix this by exporting (to shared
> memory) the oldest in-progress xid of a decoding slot. Then the waiting
> code can do a XactLockTableWait() for that xid...
> I wonder if this is isn't maybe sufficient. Yes, it can deadlock, but
> that's already the case for VACUUM FULLs of system tables, although less
> likely. And it will be detected/handled.
> There's one more snag though, we currently allow CLUSTER system_table;
> in an existing transaction. I think that'd have to be disallowed.
It wouldn't bother me too much to restrict CLUSTER system_table by
PreventTransactionChain() at wal_level = logical, but obviously it
would be nicer if we *didn't* have to do that.
In general, I don't think waiting on an XID is sufficient because a
process can acquire a heavyweight lock without having an XID. Perhaps
use the VXID instead?
One thought I had about waiting for decoding to catch up is that you
might do it before acquiring the lock. Of course, you then have a
problem if you get behind again before acquiring the lock. It's
tempting to adopt the solution we used for RangeVarGetRelidExtended,
namely: wait for catchup without the lock, acquire the lock, see
whether we're still caught up if so great else release lock and loop.
But there's probably too much starvation risk to get away with that.
On the whole, I'm leaning toward thinking that the other solution
(recording the old-to-new CTID mappings generated by CLUSTER to the
extent that they are needed) is probably more elegant.
The Enterprise PostgreSQL Company
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: