Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

Robert Haas Wed, 17 Feb 2016 06:53:29 -0800

On Tue, Feb 16, 2016 at 2:59 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Andres Freund <and...@anarazel.de> writes:
>> I wonder if we shouldn't just expose a 'which pid is process X waiting
>> for' API, implemented serverside. That's generally really useful, and
>> looks like it's actually going to be less complicated than that
>> query... And it's surely going to be faster.
>
> Attached is a draft patch for a new function that reports the set of PIDs
> directly blocking a given PID (that is, holding or awaiting conflicting
> locks on the lockable object it's waiting for).
>
> I replaced isolationtester's pg_locks query with this, and found that
> it's about 9X faster in a normal build, and 3X faster with
> CLOBBER_CACHE_ALWAYS turned on.  That would give us some nice headroom
> for the isolation tests with CLOBBER_CACHE_ALWAYS animals.  (Note that
> in view of
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2016-02-14%2007%3A38%3A37
> we still need to do *something* about the speed of the new deadlock-hard
> test; this patch could avoid the need to dumb down or slow down that test
> even further.)
>
> Not to be neglected also is that (I believe) this gives the right answer,
> whereas isolationtester's existing query is currently completely broken by
> parallel queries, and it doesn't understand non-conflicting lock modes
> either.  (It did, at least partially, before commit 38f8bdcac4982215;
> I am not sure that taking out the mode checks was a good idea.  But
> putting them back would make the query slower yet.)


The reason I took that out is because it breaks the deadlock-soft
test.  It's possible to have a situation where no granted lock
conflicts with an awaited lock.  If that happens, the old query
wrongly concluded that the waiting process was not in fact waiting.
(Consider A hold AccessShareLock, B awaits AccessExclusiveLock, C now
requests AccessShareLock and *waits*.)

As for the patch itself, I'm having trouble grokking what it's trying
to do.  I think it might be worth having a comment defining precisely
what we mean by "A blocks B".  I would define "A blocks B" in general
as either A holds a lock which conflicts with one sought by B
(hard-blocked) or A awaits a lock which conflicts with one sought by B
and precedes it in the wait queue (soft-blocked).  I have wondered
before if we shouldn't modify pg_locks to expose the wait-queue
ordering; without that, you can't reliably determine in general
whether A soft-blocks B, which means every view anyone has ever
written over pg_locks that purports to say who blocks who is
necessarily buggy.

For parallel queries, there's a further relevant distinction when we
say "A blocks B".  We might mean either that (1) process B cannot
resume execution until the lock conflict is resolved or (2) the group
leader for process B cannot complete the current parallel operation
until the lock conflict is resolved.  If you're trying to figure out
why one particular member of a parallel group is stuck, you want to
answer question #1.  If you're trying to figure out what all the
things that need to get out of the way to finish the query, you want
to answer question #2.  I think this function is aiming to answer
question #2, not question #1, but I'm less clear on the reason behind
that definitional choice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

Reply via email to