Re: [HACKERS] XactLockTableWait doesn't set wait_event correctly
On Thu, Dec 1, 2016 at 7:35 PM, Simon Riggs wrote: > On 2 December 2016 at 00:28, Robert Haas wrote: >> On Wed, Nov 30, 2016 at 6:50 AM, Simon Riggs wrote: >>> Obtaining a tuple lock requires two separate actions: First we do >>> LockTuple() and then we do XactLockTableWait(). >> >> I think that's kind of a confusing way of looking at it. LockTuple() >> waits for a "tuple" lmgr lock, and XactLockTableWait waits for a >> "transaction" lmgr lock. Those two things are both part of a larger >> protocol for managing access to what we refer to as tuple locks at the >> SQL level. I don't think conflating those things would be a very good >> idea, because it's useful to know which phase you're currently doing - >> e.g. anybody waiting on a tuple lock is not first in the queue. > > Why is it useful to know which phase you're at? What can I do with that info? Well, I just mentioned one thing you can do with it... > Why is knowing that you're doing a "transaction lock" more important > than the fact you're doing a tuple lock on a particular db, relation, > page and tuple? The "transaction lock" tells you nothing a user can > act upon to improve the situation. I think the charter of pg_locks is to provide visibility into what's actually going on in the lock table. It could be useful to provide additional supplementary information as well, if we can do that in a reasonable way, but I think trying to paper over the fact that internally it's a transaction lock probably isn't a good idea. For example, suppose next week somebody decides to replace the "waiting" column in pg_locks with a column indicating the position in the lock queue, with NULL indicating a granted lock. Well, if transaction locks look like tuple locks, the person writing that patch will find it impossible to display something sane in those cases, because the transaction lock queue is separate from the tuple lock queue and you can't somehow glue those together without distorting what the lock manager is actually doing. That kind of thing is the natural consequence of displaying something other than what the system is actually doing internally. But I have no objection to you adding supplementary detail also. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XactLockTableWait doesn't set wait_event correctly
On 2 December 2016 at 00:28, Robert Haas wrote: > On Wed, Nov 30, 2016 at 6:50 AM, Simon Riggs wrote: >> Obtaining a tuple lock requires two separate actions: First we do >> LockTuple() and then we do XactLockTableWait(). > > I think that's kind of a confusing way of looking at it. LockTuple() > waits for a "tuple" lmgr lock, and XactLockTableWait waits for a > "transaction" lmgr lock. Those two things are both part of a larger > protocol for managing access to what we refer to as tuple locks at the > SQL level. I don't think conflating those things would be a very good > idea, because it's useful to know which phase you're currently doing - > e.g. anybody waiting on a tuple lock is not first in the queue. Why is it useful to know which phase you're at? What can I do with that info? Why is knowing that you're doing a "transaction lock" more important than the fact you're doing a tuple lock on a particular db, relation, page and tuple? The "transaction lock" tells you nothing a user can act upon to improve the situation. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XactLockTableWait doesn't set wait_event correctly
On Wed, Nov 30, 2016 at 6:50 AM, Simon Riggs wrote: > Obtaining a tuple lock requires two separate actions: First we do > LockTuple() and then we do XactLockTableWait(). I think that's kind of a confusing way of looking at it. LockTuple() waits for a "tuple" lmgr lock, and XactLockTableWait waits for a "transaction" lmgr lock. Those two things are both part of a larger protocol for managing access to what we refer to as tuple locks at the SQL level. I don't think conflating those things would be a very good idea, because it's useful to know which phase you're currently doing - e.g. anybody waiting on a tuple lock is not first in the queue. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XactLockTableWait doesn't set wait_event correctly
On 29 November 2016 at 19:03, Amit Kapila wrote: > how will we distinguish it when some > process is actually waiting on tuple lock? The point is that both those actions are waiting for a tuple lock. Obtaining a tuple lock requires two separate actions: First we do LockTuple() and then we do XactLockTableWait(). So log_lock_wait output has two separate log entries for the same thing, which is inconsistent. (One mentions the relation name, the other mentions the relation id). (Note that I'm not saying that all callers of XactLockTableWait are tuple lockers; if they were it would be less confusing). But at least that info is visible. log_lock_waits output allows you to see that a XactLockTableWait is actually held for a tuple. There is no way to do that for pg_stat_activity or pg_locks output, which is inconsistent. I'm not worried about abstraction layers, I'd just like to get useful information out of the server to diagnose locking issues. Right now, nobody can do that. My proposal to resolve this is... 1. Make log_lock_wait output the same for both cases... following this format LOG: process 648 still waiting for ExclusiveLock on tuple (0,1) of relation 137344 of database 12245 after 1045.220 ms DETAIL: Process holding the lock: 6460. Wait queue: 648. STATEMENT: update t1 set c1=4 where c1=1; Nobody will miss the other format, since the above format has all the same information. 2. Set wait_event_type = tuple when we wait during XactLockTableWait. We need the reason info, not the actual wait info, since this is for users not for our own diagnostics. This isn't very important, since wait_event_type doesn't include details like which tuple or relation caused the wait. 3. pg_locks output can't fit both locktag and reason info inside the LOCKTAG struct, so we'd need to do something like store the reason info in a separate hash table, so it can be used to explain a transaction lock entry. I'm sure that will raise an objection, so we'll need something like a view called pg_lock_wait_reason. Better suggestions welcome. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XactLockTableWait doesn't set wait_event correctly
On Tue, Nov 29, 2016 at 6:26 AM, Simon Riggs wrote: > New (9.4) XactLockTableWait() sets the reason for the wait, so that if > we emit a log message then it will produce a log message to say its > not waiting on a transaction, its waiting on a lock. > > Nice. log_lock_waits works great. > > New (9.6) wait_event code is not correctly set. In ProcSleep() we set > the wait reason according to the actual lock tag, effectively ignoring > the information specifically provided by XactLockTableWait(). > > Ugh. > > pg_stat_activity shows a wait_event of 'transactionid' rather than 'tuple' > pg_locks shows a transactionid lock rather than a wait for tuple lock > > That looks like a bug since we get different answers from > log_lock_wait and wait_event, which is annoying and confusing. What you find annoying or confusing is up to you, but it's not a bug, and it's not particularly inconsistent, either. log_lock_waits reports a transaction lock, pg_stat_activity reports a transactionid lock, and pg_locks reports a transactionid lock. Now, it is true *in addition* to reporting that the lock in question is a transaction lock, log_lock_waits also reports some context information: 2016-11-29 13:49:59.702 EST [33658] LOG: process 33658 still waiting for ShareLock on transaction 2997486 after 1000.748 ms 2016-11-29 13:49:59.702 EST [33658] DETAIL: Process holding the lock: 33653. Wait queue: 33658. 2016-11-29 13:49:59.702 EST [33658] CONTEXT: while updating tuple (0,67) in relation "pgbench_accounts" 2016-11-29 13:49:59.702 EST [33658] STATEMENT: update pgbench_accounts set filler = 'b' where aid = 1; ...but right there on the first line, it clearly says "waiting for ShareLock on transaction 2997". I'm not totally unsympathetic to the argument that perhaps XactLockTableWait() ought to pass down the context information through LockAcquire() so that it reaches ProcSleep() and we can set the wait event to something like Tuple/Update instead of Lock/Transaction. But I don't think it's a slam-dunk, either. If we did that, it would be consistent with the additional detail provided by log_lock_waits rather than the primary message, and it would be inconsistent with pg_locks. Also, I think that the reason why we originally used an error-context callback here rather than passing down some additional detail into LockAcquire() is that the breaking the abstraction layer seemed messy and unappealing. IIRC, Tom, in particular, was concerned about that last point. But feel free to propose a patch... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XactLockTableWait doesn't set wait_event correctly
On Tue, Nov 29, 2016 at 4:56 PM, Simon Riggs wrote: > New (9.4) XactLockTableWait() sets the reason for the wait, so that if > we emit a log message then it will produce a log message to say its > not waiting on a transaction, its waiting on a lock. > > Nice. log_lock_waits works great. > > New (9.6) wait_event code is not correctly set. In ProcSleep() we set > the wait reason according to the actual lock tag, effectively ignoring > the information specifically provided by XactLockTableWait(). > The information provided by XactLockTableWait() helps to display the more information via context, but the message still suggests transaction (refer LOG line in below message): LOG: process 6460 still waiting for ShareLock on transaction 42960 after 41822.775 ms DETAIL: Process holding the lock: 5704. Wait queue: 6460. CONTEXT: while updating tuple (0,1) in relation "t1" STATEMENT: update t1 set c1=3 where c1=1; So I am not sure if displaying tuple in pg_stat_activity is better than displaying transactionid and how will we distinguish it when some process is actually waiting on tuple lock? The process waiting on tuple lock displays log message as below: LOG: process 648 still waiting for ExclusiveLock on tuple (0,1) of relation 137344 of database 12245 after 1045.220 ms DETAIL: Process holding the lock: 6460. Wait queue: 648. STATEMENT: update t1 set c1=4 where c1=1; -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] XactLockTableWait doesn't set wait_event correctly
New (9.4) XactLockTableWait() sets the reason for the wait, so that if we emit a log message then it will produce a log message to say its not waiting on a transaction, its waiting on a lock. Nice. log_lock_waits works great. New (9.6) wait_event code is not correctly set. In ProcSleep() we set the wait reason according to the actual lock tag, effectively ignoring the information specifically provided by XactLockTableWait(). Ugh. pg_stat_activity shows a wait_event of 'transactionid' rather than 'tuple' pg_locks shows a transactionid lock rather than a wait for tuple lock That looks like a bug since we get different answers from log_lock_wait and wait_event, which is annoying and confusing. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers