On 2013-10-21 16:15:58 +0200, Andres Freund wrote:
> > I don't think I understand exactly what you have in mind for (2); can
> > you elaborate?  I have always thought that having a
> > WaitForDecodingToCatchUp() primitive was a good way of handling
> > changes that were otherwise too difficult to track our way through.  I
> > am not sure you're doing that at all right now, which in some sense I
> > guess is fine, but I haven't really understood your aversion to this
> > solution.  There are some locking issues to be worked out here, but
> > the problems don't seem altogether intractable.
> So, what we need to do for rewriting catalog tables would be:
> 1) lock table against writes
> 2) wait for all in-progress xacts to finish, they could have modified
>    the table in question (we don't keep locks on system tables)
> 3) acquire xlog insert pointer
> 4) wait for all logical decoding actions to read past that pointer
> 5) upgrade the lock to an access exclusive one
> 6) perform vacuum full as usual
> The lock upgrade hazards in here are the reason I am adverse to the
> solution. And I don't see how we can avoid them, since in order for
> decoding to catchup it has to be able to read from the
> catalog... Otherwise it's easy enough to implement.

So, I thought about this for some more and I think I've a partial
solution to the problem.

The worst thing about deadlocks that occur in the above is that they
could be the VACUUM FULL waiting for the "restart LSN"[1] of a decoding
slot to progress, but the restart LSN cannot progress because the slot
is waiting for a xid/transaction to end which is being blocked by the
lock upgrade from VACUUM FULL. Such conflicts are not visible to the
deadlock detector, which obviously is bad.
I've prototyped this (~25 lines) and this happens pretty frequently. But
it turns out that we can actually fix this by exporting (to shared
memory) the oldest in-progress xid of a decoding slot. Then the waiting
code can do a XactLockTableWait() for that xid...

I wonder if this is isn't maybe sufficient. Yes, it can deadlock, but
that's already the case for VACUUM FULLs of system tables, although less
likely. And it will be detected/handled.
There's one more snag though, we currently allow CLUSTER system_table;
in an existing transaction. I think that'd have to be disallowed.

What do you think?


Andres Freund

[1] The "restart LSN" is the point from where we need to be able read
WAL to replay all changes the receiving side hasn't acked yet.

 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to