On 13.04.2018 18:41, Andres Freund wrote:
On 2018-04-13 16:43:09 +0300, Konstantin Knizhnik wrote:
Updated patch is attached.
+ * Ensure that only one backend is checking for deadlock.
+ * Otherwise under high load cascade of deadlock timeout expirations
can cause stuck of Postgres.
+ if (!pg_atomic_test_set_flag(&ProcGlobal->activeDeadlockCheck))
+ enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
+ inside_deadlock_check = true;
I can't see that ever being accepted. This means there's absolutely no
bound for deadlock checks happening even under light concurrency, even
if there's no contention for a large fraction of the time.
It may cause problems only if
1. There is large number of active sessions
2. They perform deadlock-prone queries (so no attempts to avoid
deadlocks at application level)
3. Deadlock timeout is set to be very small (10 msec?)
Otherwise either probability that all backends once and once again are
trying to check deadlocks concurrently is very small (and can be even
more reduced by using random timeout for subsequent deadlock checks),
either system can not normally function in any case because large number
of clients fall into deadlock.
If you want to improve this, improve the efficiency of the
implementation, check multiple lockers at the same time (set a bit
afterwards that they don't recheck themselves). There's plenty
approaches that don't result in a significantly worse user experience.
It seems to me that the best best solution will be to perform deadlock
by some dedicated process (deadlock checker) with specified frequency,
or do it in backends themselves but ensure that interval between
deadlock checks exceeds deadlock_timeout.
On both cases it is necessary to check for all deadlock loops, not only
deadlocks involving self transaction.
And definitely we should be able to abort "foreign" transaction, not
only transaction performed by this backend.
I do not know how difficult it will be to implement. I expect that it
will be not so easy, otherwise why deadlock detection was not
implemented in this way in Postgres from the very beginning...
I completely agree that there are plenty of different approaches, but
IMHO the currently used strategy is the worst one, because it can stall
system even if there are not deadlocks at all.
I always think that deadlock is a programmer's error rather than normal
situation. May be it is wrong assumption but I can not believe that some
system can normally work when deadlocks happen quite frequently.
Consider classical example: transfer money from one account to another:
update accounts set balance=balance+? where aid=?;
update accounts set balance=balance-? where aid=?;
With thousands accounts and random choice of account IDs concurrent
execution of this transaction by multiple clients (~100) cause deadlocks
in less than few percents of transactions.
My patch doesn't somehow affect performance and latency in this case.
Obviously that if number of accounts is changed to 10 and number of
clients to 1000, then deadlocks will happen permanently. But in this
case system will not be able to work normally either with my patch,
either without it.
So before implementing some complicated solution of the problem9too slow
deadlock detection), I think that first it is necessary to understand
whether there is such problem at al and under which workload it can happen.
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company