Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-08-25 Thread Amit Kapila
On Fri, Aug 26, 2016 at 4:59 AM, neha khatri  wrote:

> Hello,
>
> I noticed that a small optimization is possible in the flow of wait stat
> reporting for the LWLocks, when the pgstat_track_activities is disabled.
> If the check for pgstat_track_activities is done before invoking
>  LWLockReportWaitStart() instead of inside the pgstat_report_wait_start(),
> it can save some of the statements execution where the end result of
> LWLockReportWaitStart() is a NO-OP because pgstat_track_activities = false.
>
>
This is only called in slow path which means when we have to wait or sleep,
so saving few instructions will not make much difference.  Note that both
the functions you have mentioned are inline functions.   However, If you
want, you can try that way and see if this leads to any gain.


> I also see, that there are other callers of pgstat_report_wait_start() as
> well, which would also have to change to add a check for the
> pg_stat_activities, if the check is removed from the
> pgstat_report_wait_start(). Is the pg_stat_activities check inside
> pgstat_report_wait_start() because of some protocol being followed?
>

Not as such, but that variable is mainly used in pgstat.c/.h only.


> Would it be worth making that change.
>
>
-1 for the proposed change.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-08-25 Thread neha khatri
Hello,

I noticed that a small optimization is possible in the flow of wait stat
reporting for the LWLocks, when the pgstat_track_activities is disabled.
If the check for pgstat_track_activities is done before invoking
 LWLockReportWaitStart() instead of inside the pgstat_report_wait_start(),
it can save some of the statements execution where the end result of
LWLockReportWaitStart() is a NO-OP because pgstat_track_activities = false.

I also see, that there are other callers of pgstat_report_wait_start() as
well, which would also have to change to add a check for the
pg_stat_activities, if the check is removed from the
pgstat_report_wait_start(). Is the pg_stat_activities check inside
pgstat_report_wait_start() because of some protocol being followed? Would
it be worth making that change.

Regards,
Neha

Cheers,
Neha

On Thu, Mar 10, 2016 at 12:31 AM, Amit Kapila 
wrote:

> On Fri, Mar 4, 2016 at 7:11 PM, Alexander Korotkov 
> wrote:
> >
> >>
> >> If yes, then the only slight worry is that there will lot of repetition
> in wait_event_type column, otherwise it is okay.
> >
> >
> > There is morerows attribute of entry tag in Docbook SGML, it behaves
> like rowspan in HTML.
> >
>
> Thanks for the suggestion.  I have updated the patch to include
> wait_event_type information in the wait_event table.
>
> As asked above by Robert, below is performance data with the patch.
>
> M/C Details
> --
> IBM POWER-8 24 cores, 192 hardware threads
> RAM = 492GB
>
> Performance Data
> 
> min_wal_size=15GB
> max_wal_size=20GB
> checkpoint_timeout=15min
> maintenance_work_mem = 1GB
> checkpoint_completion_target = 0.9
>
>
> pgbench read-only (median of 3, 5-min runs)
>
> clients BASE PATCH %
> 1 19703.549206 19992.141542 1.4646718364
> 8 120105.542849 127717.835367 6.3380026745
> 64 487334.338764 495861.7211254 1.7498012521
>
>
> The read-only data shows some improvement with patch, but I think this is
> mostly attributed to run-to-run variation.
>
> pgbench read-write (median of 3, 30-min runs)
>
> clients BASE PATCH %
> 1 1703.275728 1696.568881 -0.3937616729
> 8 8884.406185 9442.387472 6.2804567394
> 64 32648.82798 32113.002416 -1.6411785572
>
>
> In the above data, the read-write data shows small regression (1.6%) at
> higher client-count, but when I ran individually that test, the difference
> was 0.5%. I think it is mostly attributed to run-to-run variation as we see
> with read-only tests.
>
>
> Thanks to Mithun C Y for doing performance testing of this patch.
>
> As this patch is adding 4-byte variable to shared memory structure PGPROC,
> so this is susceptible to memory alignment issues for shared buffers as
> discussed in thread [1], but in general the performance data doesn't
> indicate any regression.
>
> [1] - http://www.postgresql.org/message-id/CAA4eK1+ZeB8PMwwktf+
> 3brs0pt4ux6rs6aom0uip8c6shjw...@mail.gmail.com
>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-24 Thread Robert Haas
On Thu, Mar 24, 2016 at 7:28 AM, Alexander Korotkov
 wrote:
> Since, patch for exposing current wait event information in PGPROC was
> committed, it becomes possible to collect wait event statistics using
> sampling.  Despite I'm not fan of this approach, it is still useful and
> definitely better than nothing.
> In PostgresPro, we actually already had it.  Now, it's too late to include
> something new to 9.6.  This is why I've rework it and publish at github as
> an extension for 9.6: https://github.com/postgrespro/pg_wait_sampling/
> Hopefully, it could be considered as contrib for 9.7.

Spiffy.  That was fast.

I think the sampling approach is going to be best on very large
systems under heavy load; I suspect counting every event is going to
be too expensive - especially once we add more events for things like
block read and client wait.  It is quite possible that we can do other
things when tracing individual sessions or in scenarios where some
performance degradation is OK.  But I like the idea of doing the
sampling thing first - I think that will be very useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-24 Thread Alexander Korotkov
Hi!

Since, patch for exposing current wait event information in PGPROC was
committed, it becomes possible to collect wait event statistics using
sampling.  Despite I'm not fan of this approach, it is still useful and
definitely better than nothing.
In PostgresPro, we actually already had it.  Now, it's too late to include
something new to 9.6.  This is why I've rework it and publish at github as
an extension for 9.6: https://github.com/postgrespro/pg_wait_sampling/
Hopefully, it could be considered as contrib for 9.7.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-19 Thread Andres Freund
On 2016-03-18 11:01:04 -0400, Robert Haas wrote:
> On Thu, Mar 17, 2016 at 9:22 PM, Michael Paquier
>  wrote:
> > FWIW, my instinctive thought on the matter is to report the event
> > directly in WaitLatch() via a name of the event caller provided
> > directly in it. The category of the event is then defined
> > automatically as we would know its origin. The code path defining the
> > origin point from where a event type comes from is the critical thing
> > I think to define an event category. The LWLock events are doing that
> > in lwlock.c.
> 
> I'm very skeptical of grouping everything that waits using latches as
> a latch wait, but maybe it's OK to do it that way.  I was thinking
> more of adding categories like "client wait" with events like "client
> read" and "client write".

+1. I think categorizing latch waits together will be pretty much
meaningless. We use the same latch for a lot of different things, and
the context in which we're waiting is the important bit.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-19 Thread Robert Haas
On Wed, Mar 16, 2016 at 10:03 PM, Michael Paquier
 wrote:
> On Wed, Mar 16, 2016 at 5:28 AM, Robert Haas  wrote:
>> On Tue, Mar 15, 2016 at 9:17 AM, Thomas Reiss  
>> wrote:
>>> Here's a small docpatch to fix two typos in the new documentation.
>>
>> Thanks, committed.
>
> I just had a quick look at the wait_event committed, and I got a
> little bit disappointed that we actually do not track latch waits yet,
> which is perhaps not that useful actually as long as an event name is
> not associated to a given latch wait when calling WaitLatch. I am not
> asking for that with this release, this is just for the archive's
> sake, and I don't mind coding that myself anyway if need be. The
> LWLock tracking facility looks rather cool btw :)

Yes, I'm quite excited about this.  I think it's pretty darn awesome.

I doubt that it would be useful to treat a latch wait as an event.
It's too generic.  You'd want something more specific, like waiting
for WAL to arrive or waiting for a tuple from a parallel worker or
waiting to write to the client.  It'll take some thought to figure out
how to organize and categorize that stuff, but it'll also be wicked
cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-19 Thread Michael Paquier
On Wed, Mar 16, 2016 at 5:28 AM, Robert Haas  wrote:
> On Tue, Mar 15, 2016 at 9:17 AM, Thomas Reiss  wrote:
>> Here's a small docpatch to fix two typos in the new documentation.
>
> Thanks, committed.

I just had a quick look at the wait_event committed, and I got a
little bit disappointed that we actually do not track latch waits yet,
which is perhaps not that useful actually as long as an event name is
not associated to a given latch wait when calling WaitLatch. I am not
asking for that with this release, this is just for the archive's
sake, and I don't mind coding that myself anyway if need be. The
LWLock tracking facility looks rather cool btw :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-19 Thread Amit Kapila
On Thu, Mar 17, 2016 at 7:33 AM, Michael Paquier 
wrote:
>
> On Wed, Mar 16, 2016 at 5:28 AM, Robert Haas 
wrote:
> > On Tue, Mar 15, 2016 at 9:17 AM, Thomas Reiss 
wrote:
> >> Here's a small docpatch to fix two typos in the new documentation.
> >
> > Thanks, committed.
>
> I just had a quick look at the wait_event committed, and I got a
> little bit disappointed that we actually do not track latch waits yet,
> which is perhaps not that useful actually as long as an event name is
> not associated to a given latch wait when calling WaitLatch.
>

You are right and few more like I/O wait are also left out from initial
patch intentionally just to get the base functionality in.  One of the
reasons I have not kept it in the patch was it needs much more thorough
performance testing (even though theoretically overhead shouldn't be there)
with specific kind of tests.

>
> I am not
> asking for that with this release, this is just for the archive's
> sake, and I don't mind coding that myself anyway if need be.

Thanks, feel free to pickup in next release (or for this release, if
everybody feels strongly to have it in this release) if you don't see any
patch for the same.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-19 Thread Michael Paquier
On Thu, Mar 17, 2016 at 11:10 PM, Robert Haas  wrote:
> On Wed, Mar 16, 2016 at 10:03 PM, Michael Paquier
>  wrote:
>> On Wed, Mar 16, 2016 at 5:28 AM, Robert Haas  wrote:
>>> On Tue, Mar 15, 2016 at 9:17 AM, Thomas Reiss  
>>> wrote:
 Here's a small docpatch to fix two typos in the new documentation.
>>>
>>> Thanks, committed.
>>
>> I just had a quick look at the wait_event committed, and I got a
>> little bit disappointed that we actually do not track latch waits yet,
>> which is perhaps not that useful actually as long as an event name is
>> not associated to a given latch wait when calling WaitLatch. I am not
>> asking for that with this release, this is just for the archive's
>> sake, and I don't mind coding that myself anyway if need be. The
>> LWLock tracking facility looks rather cool btw :)
>
> Yes, I'm quite excited about this.  I think it's pretty darn awesome.
>
> I doubt that it would be useful to treat a latch wait as an event.
> It's too generic.  You'd want something more specific, like waiting
> for WAL to arrive or waiting for a tuple from a parallel worker or
> waiting to write to the client.  It'll take some thought to figure out
> how to organize and categorize that stuff, but it'll also be wicked
> cool.

FWIW, my instinctive thought on the matter is to report the event
directly in WaitLatch() via a name of the event caller provided
directly in it. The category of the event is then defined
automatically as we would know its origin. The code path defining the
origin point from where a event type comes from is the critical thing
I think to define an event category. The LWLock events are doing that
in lwlock.c.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-18 Thread Robert Haas
On Thu, Mar 17, 2016 at 9:22 PM, Michael Paquier
 wrote:
> FWIW, my instinctive thought on the matter is to report the event
> directly in WaitLatch() via a name of the event caller provided
> directly in it. The category of the event is then defined
> automatically as we would know its origin. The code path defining the
> origin point from where a event type comes from is the critical thing
> I think to define an event category. The LWLock events are doing that
> in lwlock.c.

I'm very skeptical of grouping everything that waits using latches as
a latch wait, but maybe it's OK to do it that way.  I was thinking
more of adding categories like "client wait" with events like "client
read" and "client write".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-15 Thread Alvaro Herrera
Robert Haas wrote:
> On Tue, Mar 15, 2016 at 10:41 AM, Thom Brown  wrote:
> >> It turns out that I hate the fact that the Wait Event Name column is
> >> effectively in a random order.  If a user sees a message, and goes to
> >> look up the value in the wait_event description table, they either
> >> have to search with their browser/PDF viewer, or scan down the list
> >> looking for the item they're looking for, not knowing how far down it
> >> will be.  The same goes for wait event type.

> Hmm, I'm not sure this is a good idea.  I don't think it's crazy to
> report the locks in the order they are defined in the source code;
> many people will be familiar with that order, and it might make the
> list easier to maintain.  On the other hand, I'm also not sure this is
> a bad idea.  Alphabetical order is a widely-used standard.  So, I'm
> going to abstain from any strong position here and ask what other
> people think of Thom's proposed change.

I think using implementation order is crazy.  +1 for alphabetical.  If
this really makes devs' lives more difficult (and I disagree that it
does), let's reorder the items in the source code too.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-15 Thread Robert Haas
On Tue, Mar 15, 2016 at 10:41 AM, Thom Brown  wrote:
>> It turns out that I hate the fact that the Wait Event Name column is
>> effectively in a random order.  If a user sees a message, and goes to
>> look up the value in the wait_event description table, they either
>> have to search with their browser/PDF viewer, or scan down the list
>> looking for the item they're looking for, not knowing how far down it
>> will be.  The same goes for wait event type.
>>
>> I've attached a patch to sort the list by wait event type and then
>> wait event name.  It also corrects minor SGML indenting issues.
>
> Let's try that again, this time without duplicating a row, and omitting 
> another.

Hmm, I'm not sure this is a good idea.  I don't think it's crazy to
report the locks in the order they are defined in the source code;
many people will be familiar with that order, and it might make the
list easier to maintain.  On the other hand, I'm also not sure this is
a bad idea.  Alphabetical order is a widely-used standard.  So, I'm
going to abstain from any strong position here and ask what other
people think of Thom's proposed change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-15 Thread Robert Haas
On Tue, Mar 15, 2016 at 9:17 AM, Thomas Reiss  wrote:
> Here's a small docpatch to fix two typos in the new documentation.

Thanks, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-15 Thread Thom Brown
On 15 March 2016 at 14:00, Thom Brown  wrote:
> On 10 March 2016 at 18:58, Robert Haas  wrote:
>> On Thu, Mar 10, 2016 at 12:18 AM, Amit Kapila  
>> wrote:
>>> On Wed, Mar 9, 2016 at 7:17 PM, Robert Haas  wrote:

 On Wed, Mar 9, 2016 at 8:31 AM, Amit Kapila 
 wrote:
 > Thanks for the suggestion.  I have updated the patch to include
 > wait_event_type information in the wait_event table.

 I think we should remove "a server process is" from all of these entries.

 Also, I think this kind of thing should be tightened up:

 + A server process is waiting on any one of the
 commit_timestamp
 + buffer locks to read or write the commit_timestamp page in the
 + pg_commit_ts subdirectory.

 I'd just write: Waiting to read or write a commit timestamp buffer.

>>>
>>> Okay, changed as per suggestion and fixed the morerows issue pointed by
>>> Thom.
>>
>> Committed with some further editing.  In particular, the way you
>> determined whether we could safely access the tranche information for
>> any given ID was wrong; please check over what I did and make sure
>> that isn't also wrong.
>>
>> Whew, this was a long process, but we got there.  Some initial pgbench
>> testing shows that by far the most common wait event observed on that
>> workload is WALWriteLock, which is pretty interesting: perf -e cs and
>> LWLOCK_STATS let you measure the most *frequent* wait events, but that
>> ignores duration.  Sampling pg_stat_activity tells you which things
>> you're spending the most *time* waiting for, which is awfully neat.
>
> It turns out that I hate the fact that the Wait Event Name column is
> effectively in a random order.  If a user sees a message, and goes to
> look up the value in the wait_event description table, they either
> have to search with their browser/PDF viewer, or scan down the list
> looking for the item they're looking for, not knowing how far down it
> will be.  The same goes for wait event type.
>
> I've attached a patch to sort the list by wait event type and then
> wait event name.  It also corrects minor SGML indenting issues.

Let's try that again, this time without duplicating a row, and omitting another.

Thom


sort_wait_event_doc_table_v2.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-15 Thread Thom Brown
On 10 March 2016 at 18:58, Robert Haas  wrote:
> On Thu, Mar 10, 2016 at 12:18 AM, Amit Kapila  wrote:
>> On Wed, Mar 9, 2016 at 7:17 PM, Robert Haas  wrote:
>>>
>>> On Wed, Mar 9, 2016 at 8:31 AM, Amit Kapila 
>>> wrote:
>>> > Thanks for the suggestion.  I have updated the patch to include
>>> > wait_event_type information in the wait_event table.
>>>
>>> I think we should remove "a server process is" from all of these entries.
>>>
>>> Also, I think this kind of thing should be tightened up:
>>>
>>> + A server process is waiting on any one of the
>>> commit_timestamp
>>> + buffer locks to read or write the commit_timestamp page in the
>>> + pg_commit_ts subdirectory.
>>>
>>> I'd just write: Waiting to read or write a commit timestamp buffer.
>>>
>>
>> Okay, changed as per suggestion and fixed the morerows issue pointed by
>> Thom.
>
> Committed with some further editing.  In particular, the way you
> determined whether we could safely access the tranche information for
> any given ID was wrong; please check over what I did and make sure
> that isn't also wrong.
>
> Whew, this was a long process, but we got there.  Some initial pgbench
> testing shows that by far the most common wait event observed on that
> workload is WALWriteLock, which is pretty interesting: perf -e cs and
> LWLOCK_STATS let you measure the most *frequent* wait events, but that
> ignores duration.  Sampling pg_stat_activity tells you which things
> you're spending the most *time* waiting for, which is awfully neat.

It turns out that I hate the fact that the Wait Event Name column is
effectively in a random order.  If a user sees a message, and goes to
look up the value in the wait_event description table, they either
have to search with their browser/PDF viewer, or scan down the list
looking for the item they're looking for, not knowing how far down it
will be.  The same goes for wait event type.

I've attached a patch to sort the list by wait event type and then
wait event name.  It also corrects minor SGML indenting issues.

Thom


sort_wait_event_doc_table.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-15 Thread Thomas Reiss

Hello,

Here's a small docpatch to fix two typos in the new documentation.

Regards,
Thomas


Le 11/03/2016 07:19, Amit Kapila a écrit :
On Fri, Mar 11, 2016 at 12:28 AM, Robert Haas > wrote:

>
>
> Committed with some further editing.  In particular, the way you
> determined whether we could safely access the tranche information for
> any given ID was wrong; please check over what I did and make sure
> that isn't also wrong.
>

There are few typos which I have tried to fix with the attached 
patch.  Can you tell me what was wrong with the way it was done in patch?



@@ -4541,9 +4542,10 @@ AbortSubTransaction(void)
*/
LWLockReleaseAll();
+pgstat_report_wait_end();
+pgstat_progress_end_command();
AbortBufferIO();
UnlockBuffers();
-pgstat_progress_end_command();
/* Reset WAL record construction state */
XLogResetInsertion();
@@ -4653,6 +4655,9 @@ AbortSubTransaction(void)
*/
XactReadOnly = s->prevXactReadOnly;
+/* Report wait end here, when there is no further possibility of wait */
+pgstat_report_wait_end();
+
RESUME_INTERRUPTS();
 }

AbortSubTransaction() does call pgstat_report_wait_end() twice, is 
this intentional? I have kept it in the end because there is a chance 
that in between API's can again set the state to wait and also by that 
time we have not released buffer pins and heavyweight locks, so not 
sure if it makes sense to report wait end at that stage.  I have 
noticed that in WaitOnLock(), on error the wait end is set, but now 
again thinking on it, it seems it will be better to set it in 
AbortTransaction/AbortSubTransaction at end.  What do you think?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com 




diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index ec5328e..199f38a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -657,7 +657,7 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
 
  
   Lock: The backend is waiting for a heavyweight lock.
-  Heayweight locks, also known as lock manager locks or simply locks,
+  Heavyweight locks, also known as lock manager locks or simply locks,
   primarily protect SQL-visible objects such as tables.  However,
   they are also used to ensure mutual exclusion for certain internal
   operations such as relation extension.  wait_event will
@@ -965,7 +965,7 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
 
  LWLockTranche
  clog
- Waiting for I/O on a clog (transcation status) buffer.
+ Waiting for I/O on a clog (transaction status) buffer.
 
 
  commit_timestamp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-11 Thread Robert Haas
On Fri, Mar 11, 2016 at 1:19 AM, Amit Kapila  wrote:
> On Fri, Mar 11, 2016 at 12:28 AM, Robert Haas  wrote:
>> Committed with some further editing.  In particular, the way you
>> determined whether we could safely access the tranche information for
>> any given ID was wrong; please check over what I did and make sure
>> that isn't also wrong.
>>
> There are few typos which I have tried to fix with the attached patch.  Can
> you tell me what was wrong with the way it was done in patch?

LWLocks can be taken during transaction abort and you might have to
wait, so keeping the wait state around from before you started to
abort the transaction doesn't make any sense.  You aren't waiting at
that point, and if you get stuck in some part of the code that isn't
instrumented to expose wait instrumentation, you certainly don't want
pg_stat_activity to still reflect the way things were when you
previously waiting on something else.  It's essential that we clear
the wait state as early as possible - right after LWLockReleaseAll -
because at that point we are no longer waiting.  Then you no longer
need to do it later on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-10 Thread Amit Kapila
On Fri, Mar 11, 2016 at 12:28 AM, Robert Haas  wrote:
>
>
> Committed with some further editing.  In particular, the way you
> determined whether we could safely access the tranche information for
> any given ID was wrong; please check over what I did and make sure
> that isn't also wrong.
>

There are few typos which I have tried to fix with the attached patch.  Can
you tell me what was wrong with the way it was done in patch?


@@ -4541,9 +4542,10 @@ AbortSubTransaction(void)
  */
  LWLockReleaseAll();

+ pgstat_report_wait_end();
+ pgstat_progress_end_command();
  AbortBufferIO();
  UnlockBuffers();
- pgstat_progress_end_command();

  /* Reset WAL record construction state */
  XLogResetInsertion();
@@ -4653,6 +4655,9 @@ AbortSubTransaction(void)
  */
  XactReadOnly = s->prevXactReadOnly;

+ /* Report wait end here, when there is no further possibility of wait */
+ pgstat_report_wait_end();
+
  RESUME_INTERRUPTS();
 }

AbortSubTransaction() does call pgstat_report_wait_end() twice, is this
intentional? I have kept it in the end because there is a chance that in
between API's can again set the state to wait and also by that time we have
not released buffer pins and heavyweight locks, so not sure if it makes
sense to report wait end at that stage.  I have noticed that in
WaitOnLock(), on error the wait end is set, but now again thinking on it,
it seems it will be better to set it in
AbortTransaction/AbortSubTransaction at end.  What do you think?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_typo_lwlock_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-10 Thread Robert Haas
On Thu, Mar 10, 2016 at 12:18 AM, Amit Kapila  wrote:
> On Wed, Mar 9, 2016 at 7:17 PM, Robert Haas  wrote:
>>
>> On Wed, Mar 9, 2016 at 8:31 AM, Amit Kapila 
>> wrote:
>> > Thanks for the suggestion.  I have updated the patch to include
>> > wait_event_type information in the wait_event table.
>>
>> I think we should remove "a server process is" from all of these entries.
>>
>> Also, I think this kind of thing should be tightened up:
>>
>> + A server process is waiting on any one of the
>> commit_timestamp
>> + buffer locks to read or write the commit_timestamp page in the
>> + pg_commit_ts subdirectory.
>>
>> I'd just write: Waiting to read or write a commit timestamp buffer.
>>
>
> Okay, changed as per suggestion and fixed the morerows issue pointed by
> Thom.

Committed with some further editing.  In particular, the way you
determined whether we could safely access the tranche information for
any given ID was wrong; please check over what I did and make sure
that isn't also wrong.

Whew, this was a long process, but we got there.  Some initial pgbench
testing shows that by far the most common wait event observed on that
workload is WALWriteLock, which is pretty interesting: perf -e cs and
LWLOCK_STATS let you measure the most *frequent* wait events, but that
ignores duration.  Sampling pg_stat_activity tells you which things
you're spending the most *time* waiting for, which is awfully neat.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-09 Thread Amit Kapila
On Wed, Mar 9, 2016 at 7:17 PM, Robert Haas  wrote:
>
> On Wed, Mar 9, 2016 at 8:31 AM, Amit Kapila 
wrote:
> > Thanks for the suggestion.  I have updated the patch to include
wait_event_type information in the wait_event table.
>
> I think we should remove "a server process is" from all of these entries.
>
> Also, I think this kind of thing should be tightened up:
>
> + A server process is waiting on any one of the
commit_timestamp
> + buffer locks to read or write the commit_timestamp page in the
> + pg_commit_ts subdirectory.
>
> I'd just write: Waiting to read or write a commit timestamp buffer.
>

Okay, changed as per suggestion and fixed the morerows issue pointed by
Thom.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


extend_pg_stat_activity_v14.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-09 Thread Thom Brown
On 9 March 2016 at 13:31, Amit Kapila  wrote:

> On Fri, Mar 4, 2016 at 7:11 PM, Alexander Korotkov 
> wrote:
> >
> >>
> >> If yes, then the only slight worry is that there will lot of repetition
> in wait_event_type column, otherwise it is okay.
> >
> >
> > There is morerows attribute of entry tag in Docbook SGML, it behaves
> like rowspan in HTML.
> >
>
> Thanks for the suggestion.  I have updated the patch to include
> wait_event_type information in the wait_event table.
>
> As asked above by Robert, below is performance data with the patch.
>
> M/C Details
> --
> IBM POWER-8 24 cores, 192 hardware threads
> RAM = 492GB
>
> Performance Data
> 
> min_wal_size=15GB
> max_wal_size=20GB
> checkpoint_timeout=15min
> maintenance_work_mem = 1GB
> checkpoint_completion_target = 0.9
>
>
> pgbench read-only (median of 3, 5-min runs)
>
> clients BASE PATCH %
> 1 19703.549206 19992.141542 1.4646718364
> 8 120105.542849 127717.835367 6.3380026745
> 64 487334.338764 495861.7211254 1.7498012521
>
>
> The read-only data shows some improvement with patch, but I think this is
> mostly attributed to run-to-run variation.
>
> pgbench read-write (median of 3, 30-min runs)
>
> clients BASE PATCH %
> 1 1703.275728 1696.568881 -0.3937616729
> 8 8884.406185 9442.387472 6.2804567394
> 64 32648.82798 32113.002416 -1.6411785572
>
>
> In the above data, the read-write data shows small regression (1.6%) at
> higher client-count, but when I ran individually that test, the difference
> was 0.5%. I think it is mostly attributed to run-to-run variation as we see
> with read-only tests.
>
>
> Thanks to Mithun C Y for doing performance testing of this patch.
>
> As this patch is adding 4-byte variable to shared memory structure PGPROC,
> so this is susceptible to memory alignment issues for shared buffers as
> discussed in thread [1], but in general the performance data doesn't
> indicate any regression.
>
> [1] -
> http://www.postgresql.org/message-id/caa4ek1+zeb8pmwwktf+3brs0pt4ux6rs6aom0uip8c6shjw...@mail.gmail.com
>
>
The new patch looks fine with regards to grammar and spelling.

However, the new row-spanning layout isn't declared correctly as you've
over-counted by 1 in each morerows attribute, possibly because you equated
it to the rowspan attribute in html, which will add one above whatever you
specify in the document.  "morerows" isn't the total number or rows, but
how many more rows in addition to the current one will the row span.

And yes, as Robert mentioned, please can we remove the "A server process
is" repetition?

Thom


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-09 Thread Robert Haas
On Wed, Mar 9, 2016 at 8:31 AM, Amit Kapila  wrote:
> Thanks for the suggestion.  I have updated the patch to include 
> wait_event_type information in the wait_event table.

I think we should remove "a server process is" from all of these entries.

Also, I think this kind of thing should be tightened up:

+ A server process is waiting on any one of the commit_timestamp
+ buffer locks to read or write the commit_timestamp page in the
+ pg_commit_ts subdirectory.

I'd just write: Waiting to read or write a commit timestamp buffer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-09 Thread Amit Kapila
On Fri, Mar 4, 2016 at 7:11 PM, Alexander Korotkov 
wrote:
>
>>
>> If yes, then the only slight worry is that there will lot of repetition
in wait_event_type column, otherwise it is okay.
>
>
> There is morerows attribute of entry tag in Docbook SGML, it behaves like
rowspan in HTML.
>

Thanks for the suggestion.  I have updated the patch to include
wait_event_type information in the wait_event table.

As asked above by Robert, below is performance data with the patch.

M/C Details
--
IBM POWER-8 24 cores, 192 hardware threads
RAM = 492GB

Performance Data

min_wal_size=15GB
max_wal_size=20GB
checkpoint_timeout=15min
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9


pgbench read-only (median of 3, 5-min runs)

clients BASE PATCH %
1 19703.549206 19992.141542 1.4646718364
8 120105.542849 127717.835367 6.3380026745
64 487334.338764 495861.7211254 1.7498012521


The read-only data shows some improvement with patch, but I think this is
mostly attributed to run-to-run variation.

pgbench read-write (median of 3, 30-min runs)

clients BASE PATCH %
1 1703.275728 1696.568881 -0.3937616729
8 8884.406185 9442.387472 6.2804567394
64 32648.82798 32113.002416 -1.6411785572


In the above data, the read-write data shows small regression (1.6%) at
higher client-count, but when I ran individually that test, the difference
was 0.5%. I think it is mostly attributed to run-to-run variation as we see
with read-only tests.


Thanks to Mithun C Y for doing performance testing of this patch.

As this patch is adding 4-byte variable to shared memory structure PGPROC,
so this is susceptible to memory alignment issues for shared buffers as
discussed in thread [1], but in general the performance data doesn't
indicate any regression.

[1] -
http://www.postgresql.org/message-id/caa4ek1+zeb8pmwwktf+3brs0pt4ux6rs6aom0uip8c6shjw...@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


extend_pg_stat_activity_v13.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-04 Thread Amit Kapila
On Fri, Mar 4, 2016 at 7:23 PM, Thom Brown  wrote:
>
> On 4 March 2016 at 13:41, Alexander Korotkov  wrote:
> >
> >>
> >> If yes, then the only slight worry is that there will lot of
repetition in
> >> wait_event_type column, otherwise it is okay.
> >
> >
> > There is morerows attribute of entry tag in Docbook SGML, it behaves
like
> > rowspan in HTML.
>
> +1
>

I will try to use morerows in documentation.

> Yes, we do this elsewhere in the docs.  And it is difficult to look
> through the wait event names at the moment.
>
> I'm also not keen on all the "A server process is waiting" all the way
> down the list.
>

How about giving the column name as "Wait For" instead of "Description" and
then use text like "finding or allocating space in shared memory"?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-04 Thread Thom Brown
On 4 March 2016 at 13:41, Alexander Korotkov  wrote:
> On Fri, Mar 4, 2016 at 4:20 PM, Amit Kapila  wrote:
>>
>> On Fri, Mar 4, 2016 at 4:01 PM, Alexander Korotkov 
>> wrote:
>>>
>>> On Fri, Mar 4, 2016 at 7:05 AM, Amit Kapila 
>>> wrote:

 > I think the wait event types should be documented - and the wait
 > events too, perhaps.
 >

 As discussed upthread, I have added documentation for all the possible
 wait events and an example.  Some of the LWLocks like AsyncQueueLock and
 AsyncCtlLock are used for quite similar purpose, so I have kept there
 explanation as same.
>>>
>>>
>>> Do you think it worth grouping rows in "wait_event Description" table by
>>> wait event type?
>>
>>
>> They are already grouped (implicitly), do you mean to say that we should
>> add wait event type name as well in that table?
>
>
> Yes.
>
>>
>> If yes, then the only slight worry is that there will lot of repetition in
>> wait_event_type column, otherwise it is okay.
>
>
> There is morerows attribute of entry tag in Docbook SGML, it behaves like
> rowspan in HTML.

+1

Yes, we do this elsewhere in the docs.  And it is difficult to look
through the wait event names at the moment.

I'm also not keen on all the "A server process is waiting" all the way
down the list.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-04 Thread Thom Brown
On 4 March 2016 at 13:35, Thom Brown  wrote:
> On 4 March 2016 at 04:05, Amit Kapila  wrote:
>> On Wed, Feb 24, 2016 at 7:14 PM, Robert Haas  wrote:
>>>
>>> On Mon, Feb 22, 2016 at 10:05 AM, Amit Kapila 
>>> wrote:
>>> >> I wouldn't bother tinkering with it at this point.  The value isn't
>>> >> going to be recorded on disk anywhere, so it will be easy to change
>>> >> the way it's computed in the future if we ever need to do that.
>>> >>
>>> >
>>> > Okay. Find the rebased patch attached with this mail.  I have moved
>>> > this patch to upcoming CF.
>>>
>>> I would call the functions pgstat_report_wait_start() and
>>> pgstat_report_wait_end() instead of pgstat_report_start_waiting() and
>>> pgstat_report_end_waiting().
>>>
>>
>> Changed as per suggestion and made these functions inline.
>>
>>> I think pgstat_get_wait_event_type should not return HWLock, a term
>>> that appears nowhere in our source tree at present.  How about just
>>> "Lock"?
>>>
>>
>> Changed as per suggestion.
>>
>>> I think the wait event types should be documented - and the wait
>>> events too, perhaps.
>>>
>>
>> As discussed upthread, I have added documentation for all the possible wait
>> events and an example.  Some of the LWLocks like AsyncQueueLock and
>> AsyncCtlLock are used for quite similar purpose, so I have kept there
>> explanation as same.
>>
>>> Maybe it's worth having separate wait event type names for lwlocks and
>>> lwlock tranches.  We could report LWLockNamed and LWLockTranche and
>>> document the difference: "LWLockNamed indicates that the backend is
>>> waiting for a specific, named LWLock.  The event name is the name of
>>> that lock.  LWLockTranche indicates that the backend is waiting for
>>> any one of a group of locks with similar function.  The event name
>>> identifies the general purpose of locks in that group."
>>>
>>
>> Changed as per suggestion.
>>
>>> There's no requirement that every session have every tranche
>>> registered.  I think we should consider displaying "extension" for any
>>> tranche that's not built-in, or at least for tranches that are not
>>> registered (rather than "unknown wait event").
>>>
>>> +   if (lock->tranche == 0 && lockId < NUM_INDIVIDUAL_LWLOCKS)
>>>
>>> Isn't the second part automatically true at this point?
>>>
>>
>> Fixed.
>>
>>> The changes to LockBufferForCleanup() don't look right to me.  Waiting
>>> for a buffer pin might be a reasonable thing to define as a wait
>>> event, but it shouldn't reported as if we were waiting on the LWLock
>>> itself.
>>>
>>
>> As discussed upthread, added a new wait event BufferPin for this case.
>>
>>> What happens if an error is thrown while we're in a wait?
>>>
>>
>> As discussed upthread, added in AbortTransaction and from where ever
>> LWLockReleaseAll() gets called, point to note is that we can call this
>> function only when we are sure there is no further possibility of wait on
>> LWLock.
>>
>>> Does this patch hurt performance?
>>>
>>
>> Performance tests are underway.
>
> I've attached a revised version of the patch with the following corrections:
>
> + 
> +  LWLockTranche: The backend is waiting for any one of a
> +  group of locks with similar function.  The wait_event
> +  name for this type of wait identifies the general purpose of locks
> +  in that group.
> + 
>
> s/with similar/with a similar/
>
> +
> + ControlFileLock
> + A server process is waiting to read or update the control 
> file
> + or creation of a new WAL log file.
> +
>
> As the L in WAL stands for "log" anyway, I think the extra "log" word
> can be removed.
>
> +
> + RelCacheInitLock
> + A server process is waiting to read or write to relation 
> cache
> + initialization file.
> +
>
> s/to relation/to the relation/
>
> +
> + BtreeVacuumLock
> +  A server process is waiting to read or update vacuum related
> +  information for Btree index.
> +
>
> s/vacuum related/vacuum-related/
> s/for Btree/for a Btree/
>
> +
> + AutovacuumLock
> + A server process which could be autovacuum worker is waiting 
> to
> + update or read current state of autovacuum workers.
> +
>
> s/could be autovacuum/could be that an autovacuum/
> s/read current/read the current/
>
> (discussed with Amit offline about other sources of wait, and he
> suggested autovacuum launcher, so I've added that in too)
>
> +
> + AutovacuumScheduleLock
> + A server process is waiting on another process to ensure that
> + the table it has selected for vacuum still needs vacuum.
> + 
> +
>
> s/for vacuum/for a vacuum/
> s/still needs vacuum/still needs vacuuming/
>
> +
> + SyncScanLock
> + A server process is waiting to get the start location 

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-04 Thread Alexander Korotkov
On Fri, Mar 4, 2016 at 4:20 PM, Amit Kapila  wrote:

> On Fri, Mar 4, 2016 at 4:01 PM, Alexander Korotkov 
> wrote:
>
>> On Fri, Mar 4, 2016 at 7:05 AM, Amit Kapila 
>> wrote:
>>
>>> > I think the wait event types should be documented - and the wait
>>> > events too, perhaps.
>>> >
>>>
>>> As discussed upthread, I have added documentation for all the possible
>>> wait events and an example.  Some of the LWLocks like AsyncQueueLock and
>>> AsyncCtlLock are used for quite similar purpose, so I have kept there
>>> explanation as same.
>>>
>>
>> Do you think it worth grouping rows in "wait_event Description" table by
>> wait event type?
>>
>
> They are already grouped (implicitly), do you mean to say that we should
> add wait event type name as well in that table?
>

Yes.


> If yes, then the only slight worry is that there will lot of repetition in
> wait_event_type column, otherwise it is okay.
>

There is morerows attribute of entry tag in Docbook SGML, it behaves like
rowspan in HTML.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-04 Thread Thom Brown
On 4 March 2016 at 04:05, Amit Kapila  wrote:
> On Wed, Feb 24, 2016 at 7:14 PM, Robert Haas  wrote:
>>
>> On Mon, Feb 22, 2016 at 10:05 AM, Amit Kapila 
>> wrote:
>> >> I wouldn't bother tinkering with it at this point.  The value isn't
>> >> going to be recorded on disk anywhere, so it will be easy to change
>> >> the way it's computed in the future if we ever need to do that.
>> >>
>> >
>> > Okay. Find the rebased patch attached with this mail.  I have moved
>> > this patch to upcoming CF.
>>
>> I would call the functions pgstat_report_wait_start() and
>> pgstat_report_wait_end() instead of pgstat_report_start_waiting() and
>> pgstat_report_end_waiting().
>>
>
> Changed as per suggestion and made these functions inline.
>
>> I think pgstat_get_wait_event_type should not return HWLock, a term
>> that appears nowhere in our source tree at present.  How about just
>> "Lock"?
>>
>
> Changed as per suggestion.
>
>> I think the wait event types should be documented - and the wait
>> events too, perhaps.
>>
>
> As discussed upthread, I have added documentation for all the possible wait
> events and an example.  Some of the LWLocks like AsyncQueueLock and
> AsyncCtlLock are used for quite similar purpose, so I have kept there
> explanation as same.
>
>> Maybe it's worth having separate wait event type names for lwlocks and
>> lwlock tranches.  We could report LWLockNamed and LWLockTranche and
>> document the difference: "LWLockNamed indicates that the backend is
>> waiting for a specific, named LWLock.  The event name is the name of
>> that lock.  LWLockTranche indicates that the backend is waiting for
>> any one of a group of locks with similar function.  The event name
>> identifies the general purpose of locks in that group."
>>
>
> Changed as per suggestion.
>
>> There's no requirement that every session have every tranche
>> registered.  I think we should consider displaying "extension" for any
>> tranche that's not built-in, or at least for tranches that are not
>> registered (rather than "unknown wait event").
>>
>> +   if (lock->tranche == 0 && lockId < NUM_INDIVIDUAL_LWLOCKS)
>>
>> Isn't the second part automatically true at this point?
>>
>
> Fixed.
>
>> The changes to LockBufferForCleanup() don't look right to me.  Waiting
>> for a buffer pin might be a reasonable thing to define as a wait
>> event, but it shouldn't reported as if we were waiting on the LWLock
>> itself.
>>
>
> As discussed upthread, added a new wait event BufferPin for this case.
>
>> What happens if an error is thrown while we're in a wait?
>>
>
> As discussed upthread, added in AbortTransaction and from where ever
> LWLockReleaseAll() gets called, point to note is that we can call this
> function only when we are sure there is no further possibility of wait on
> LWLock.
>
>> Does this patch hurt performance?
>>
>
> Performance tests are underway.

I've attached a revised version of the patch with the following corrections:

+ 
+  LWLockTranche: The backend is waiting for any one of a
+  group of locks with similar function.  The wait_event
+  name for this type of wait identifies the general purpose of locks
+  in that group.
+ 

s/with similar/with a similar/

+
+ ControlFileLock
+ A server process is waiting to read or update the control file
+ or creation of a new WAL log file.
+

As the L in WAL stands for "log" anyway, I think the extra "log" word
can be removed.

+
+ RelCacheInitLock
+ A server process is waiting to read or write to relation cache
+ initialization file.
+

s/to relation/to the relation/

+
+ BtreeVacuumLock
+  A server process is waiting to read or update vacuum related
+  information for Btree index.
+

s/vacuum related/vacuum-related/
s/for Btree/for a Btree/

+
+ AutovacuumLock
+ A server process which could be autovacuum worker is waiting to
+ update or read current state of autovacuum workers.
+

s/could be autovacuum/could be that an autovacuum/
s/read current/read the current/

(discussed with Amit offline about other sources of wait, and he
suggested autovacuum launcher, so I've added that in too)

+
+ AutovacuumScheduleLock
+ A server process is waiting on another process to ensure that
+ the table it has selected for vacuum still needs vacuum.
+ 
+

s/for vacuum/for a vacuum/
s/still needs vacuum/still needs vacuuming/

+
+ SyncScanLock
+ A server process is waiting to get the start location of scan
+ on table for synchronized scans.
+

s/of scan/of a scan/
s/on table/on a table/

+
+ SerializableFinishedListLock
+ A server process is waiting to access list of finished
+ serializable transactions.
+  

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-04 Thread Amit Kapila
On Fri, Mar 4, 2016 at 4:01 PM, Alexander Korotkov 
wrote:

> On Fri, Mar 4, 2016 at 7:05 AM, Amit Kapila 
> wrote:
>
>> > I think the wait event types should be documented - and the wait
>> > events too, perhaps.
>> >
>>
>> As discussed upthread, I have added documentation for all the possible
>> wait events and an example.  Some of the LWLocks like AsyncQueueLock and
>> AsyncCtlLock are used for quite similar purpose, so I have kept there
>> explanation as same.
>>
>
> Do you think it worth grouping rows in "wait_event Description" table by
> wait event type?
>

They are already grouped (implicitly), do you mean to say that we should
add wait event type name as well in that table? If yes, then the only
slight worry is that there will lot of repetition in wait_event_type
column, otherwise it is okay.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-04 Thread Alexander Korotkov
On Fri, Mar 4, 2016 at 7:05 AM, Amit Kapila  wrote:

> > I think the wait event types should be documented - and the wait
> > events too, perhaps.
> >
>
> As discussed upthread, I have added documentation for all the possible
> wait events and an example.  Some of the LWLocks like AsyncQueueLock and
> AsyncCtlLock are used for quite similar purpose, so I have kept there
> explanation as same.
>

Do you think it worth grouping rows in "wait_event Description" table by
wait event type? It would be nice if user a priori knows which wait event
have which type. Readability of this large table will be also improved.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-03 Thread Amit Kapila
On Wed, Feb 24, 2016 at 7:14 PM, Robert Haas  wrote:
>
> On Mon, Feb 22, 2016 at 10:05 AM, Amit Kapila 
wrote:
> >> I wouldn't bother tinkering with it at this point.  The value isn't
> >> going to be recorded on disk anywhere, so it will be easy to change
> >> the way it's computed in the future if we ever need to do that.
> >>
> >
> > Okay. Find the rebased patch attached with this mail.  I have moved
> > this patch to upcoming CF.
>
> I would call the functions pgstat_report_wait_start() and
> pgstat_report_wait_end() instead of pgstat_report_start_waiting() and
> pgstat_report_end_waiting().
>

Changed as per suggestion and made these functions inline.

> I think pgstat_get_wait_event_type should not return HWLock, a term
> that appears nowhere in our source tree at present.  How about just
> "Lock"?
>

Changed as per suggestion.

> I think the wait event types should be documented - and the wait
> events too, perhaps.
>

As discussed upthread, I have added documentation for all the possible wait
events and an example.  Some of the LWLocks like AsyncQueueLock and
AsyncCtlLock are used for quite similar purpose, so I have kept there
explanation as same.

> Maybe it's worth having separate wait event type names for lwlocks and
> lwlock tranches.  We could report LWLockNamed and LWLockTranche and
> document the difference: "LWLockNamed indicates that the backend is
> waiting for a specific, named LWLock.  The event name is the name of
> that lock.  LWLockTranche indicates that the backend is waiting for
> any one of a group of locks with similar function.  The event name
> identifies the general purpose of locks in that group."
>

Changed as per suggestion.

> There's no requirement that every session have every tranche
> registered.  I think we should consider displaying "extension" for any
> tranche that's not built-in, or at least for tranches that are not
> registered (rather than "unknown wait event").
>
> +   if (lock->tranche == 0 && lockId < NUM_INDIVIDUAL_LWLOCKS)
>
> Isn't the second part automatically true at this point?
>

Fixed.

> The changes to LockBufferForCleanup() don't look right to me.  Waiting
> for a buffer pin might be a reasonable thing to define as a wait
> event, but it shouldn't reported as if we were waiting on the LWLock
> itself.
>

As discussed upthread, added a new wait event BufferPin for this case.

> What happens if an error is thrown while we're in a wait?
>

As discussed upthread, added in AbortTransaction and from where
ever LWLockReleaseAll() gets called, point to note is that we can call this
function only when we are sure there is no further possibility of wait on
LWLock.

> Does this patch hurt performance?
>

Performance tests are underway.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


extend_pg_stat_activity_v11.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-01 Thread Robert Haas
On Mon, Feb 29, 2016 at 8:01 AM, Amit Kapila  wrote:
> On Thu, Feb 25, 2016 at 2:54 AM, Peter Eisentraut  wrote:
>>
>> Could you enhance the documentation about the difference between "wait
>> event type name" and "wait event name" (examples?)?
>>
>
> I am planning to add possible values for each of the wait event type and
> wait event and will add few examples as well.  Let me know if you want to
> see something else with respect to documentation?

That's pretty much what I had in mind.  I imagine that most of the
list of wait events will be a list of the individual LWLocks, which I
suppose then will each need a brief description.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-29 Thread Amit Kapila
On Thu, Feb 25, 2016 at 2:54 AM, Peter Eisentraut  wrote:
>
> Could you enhance the documentation about the difference between "wait
> event type name" and "wait event name" (examples?)?
>

I am planning to add possible values for each of the wait event type and
wait event and will add few examples as well.  Let me know if you want to
see something else with respect to documentation?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-29 Thread Amit Kapila
On Wed, Feb 24, 2016 at 7:14 PM, Robert Haas  wrote:
>
> On Mon, Feb 22, 2016 at 10:05 AM, Amit Kapila 
wrote:
> >> I wouldn't bother tinkering with it at this point.  The value isn't
> >> going to be recorded on disk anywhere, so it will be easy to change
> >> the way it's computed in the future if we ever need to do that.
> >>
> >
> > Okay. Find the rebased patch attached with this mail.  I have moved
> > this patch to upcoming CF.
>
> I would call the functions pgstat_report_wait_start() and
> pgstat_report_wait_end() instead of pgstat_report_start_waiting() and
> pgstat_report_end_waiting().
>
> I think pgstat_get_wait_event_type should not return HWLock, a term
> that appears nowhere in our source tree at present.  How about just
> "Lock"?
>
> I think the wait event types should be documented - and the wait
> events too, perhaps.
>

By above do you mean to say that we should document the name of each wait
event type and wait event.  Documenting wait event names is okay, but we
have approximately 65~70 wait events (considering individuals LWLocks,
Tranches, Locks, etc), if we want to document all the events, then I think
we can have a separate table having columns (wait event name, description)
just below pg_stat_activity and have link to that table in wait_event row
of pg_stat_activity table.  Does that matches your thought or you have
something else in mind?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-24 Thread Robert Haas
On Thu, Feb 25, 2016 at 10:31 AM, Amit Kapila  wrote:
>> There's no requirement that every session have every tranche
>> registered.  I think we should consider displaying "extension" for any
>> tranche that's not built-in, or at least for tranches that are not
>> registered (rather than "unknown wait event").
>
> I think it is better to display as an "extension" for unregistered tranches,
> but do you see any case where we will have wait event information
> for any unregistered tranche?

Sure.  If backend A has the tranche registered - because it loaded
some .so which registered it in PG_init() - and is waiting on the
lock, and backend B does not have the tranche registered - because it
didn't do that - and backend B selects from pg_stat_activity, then
you'll see a wait in an unregistered tranche.

>> The changes to LockBufferForCleanup() don't look right to me.  Waiting
>> for a buffer pin might be a reasonable thing to define as a wait
>> event, but it shouldn't reported as if we were waiting on the LWLock
>> itself.
>
> makes sense, how about having a new wait class, something like
> WAIT_BUFFER and then have wait event type as Buffer and
> wait event as BufferPin.  At this moment, I think there will be
> only one event in this class, but it seems to me waiting on buffer
> has merit to be considered as a separate class.

I would just go with BufferPin/BufferPin for now.  I can't think what
else I'd want to group with BufferPins in the same class.

>> What happens if an error is thrown while we're in a wait?
>
> For LWLocks, only FATAL error is possible which will anyway lead
> to initialization of all backend states.  For lock.c, if an error is
> thrown, then state is reset in Catch block. In
> LockBufferForCleanup(), after we set the wait event and before we
> reset it, there is only a chance of FATAL error, if any system call
> fails.  We do have one error in enable_timeouts which is called from
> ResolveRecoveryConflictWithBufferPin(), but that doesn't seem to
> be possible.  Now one question to answer is that what if tomorrow
> some one adds new error after we set the wait state, so may be
> it is better to clear wait event in AbortTransaction()?

Yeah, I think so.  Probably everywhere that we do LWLockReleaseAll()
we should also clear the wait event.

>> Does this patch hurt performance?
>
> This patch adds additional code in the path where we are going
> to sleep/wait, but we have changed some shared memory structure, so
> it is good to verify performance tests.  I am planning to run read-only
> and read-write pgbench tests (when data fits in shared), is that
> sufficient or do you expect anything more?

I'm open to ideas from others on tests that would be good to run, but
I don't have any great ideas myself right now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-24 Thread Amit Kapila
On Wed, Feb 24, 2016 at 7:14 PM, Robert Haas  wrote:

> On Mon, Feb 22, 2016 at 10:05 AM, Amit Kapila 
> wrote:
> >> I wouldn't bother tinkering with it at this point.  The value isn't
> >> going to be recorded on disk anywhere, so it will be easy to change
> >> the way it's computed in the future if we ever need to do that.
> >>
> >
> > Okay. Find the rebased patch attached with this mail.  I have moved
> > this patch to upcoming CF.
>
> I would call the functions pgstat_report_wait_start() and
> pgstat_report_wait_end() instead of pgstat_report_start_waiting() and
> pgstat_report_end_waiting().
>
> I think pgstat_get_wait_event_type should not return HWLock, a term
> that appears nowhere in our source tree at present.  How about just
> "Lock"?
>
> I think the wait event types should be documented - and the wait
> events too, perhaps.
>
> Maybe it's worth having separate wait event type names for lwlocks and
> lwlock tranches.  We could report LWLockNamed and LWLockTranche and
> document the difference: "LWLockNamed indicates that the backend is
> waiting for a specific, named LWLock.  The event name is the name of
> that lock.  LWLockTranche indicates that the backend is waiting for
> any one of a group of locks with similar function.  The event name
> identifies the general purpose of locks in that group."
>
>
Agreed with all the above points and will change the patch accordingly.


> There's no requirement that every session have every tranche
> registered.  I think we should consider displaying "extension" for any
> tranche that's not built-in, or at least for tranches that are not
> registered (rather than "unknown wait event").
>
>
I think it is better to display as an "extension" for unregistered tranches,
but do you see any case where we will have wait event information
for any unregistered tranche?

Another point to consider is that if it is not possible to have wait event
for unregisteredtranche, then should we have Assert or elog(ERROR)
instead of "unknown wait event"?



> +   if (lock->tranche == 0 && lockId < NUM_INDIVIDUAL_LWLOCKS)
>
> Isn't the second part automatically true at this point?
>


Yes, that point is automatically true and I think we should change
the same check in PRINT_LWDEBUG and LOG_LWDEBUG, although
as separate patches.



>
> The changes to LockBufferForCleanup() don't look right to me.  Waiting
> for a buffer pin might be a reasonable thing to define as a wait
> event, but it shouldn't reported as if we were waiting on the LWLock
> itself.
>

makes sense, how about having a new wait class, something like
WAIT_BUFFER and then have wait event type as Buffer and
wait event as BufferPin.  At this moment, I think there will be
only one event in this class, but it seems to me waiting on buffer
has merit to be considered as a separate class.


>
> What happens if an error is thrown while we're in a wait?
>
>
For LWLocks, only FATAL error is possible which will anyway lead
to initialization of all backend states.  For lock.c, if an error is
thrown, then state is reset in Catch block. In
LockBufferForCleanup(), after we set the wait event and before we
reset it, there is only a chance of FATAL error, if any system call
fails.  We do have one error in enable_timeouts which is called from
ResolveRecoveryConflictWithBufferPin(), but that doesn't seem to
be possible.  Now one question to answer is that what if tomorrow
some one adds new error after we set the wait state, so may be
it is better to clear wait event in AbortTransaction()?



> Does this patch hurt performance?
>
>
This patch adds additional code in the path where we are going
to sleep/wait, but we have changed some shared memory structure, so
it is good to verify performance tests.  I am planning to run read-only
and read-write pgbench tests (when data fits in shared), is that
sufficient or do you expect anything more?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-24 Thread Peter Eisentraut
Could you enhance the documentation about the difference between "wait
event type name" and "wait event name" (examples?)?  This is likely to
be quite confusing for users who are used to just the plain "waiting"
column.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-24 Thread Robert Haas
On Mon, Feb 22, 2016 at 10:05 AM, Amit Kapila  wrote:
>> I wouldn't bother tinkering with it at this point.  The value isn't
>> going to be recorded on disk anywhere, so it will be easy to change
>> the way it's computed in the future if we ever need to do that.
>>
>
> Okay. Find the rebased patch attached with this mail.  I have moved
> this patch to upcoming CF.

I would call the functions pgstat_report_wait_start() and
pgstat_report_wait_end() instead of pgstat_report_start_waiting() and
pgstat_report_end_waiting().

I think pgstat_get_wait_event_type should not return HWLock, a term
that appears nowhere in our source tree at present.  How about just
"Lock"?

I think the wait event types should be documented - and the wait
events too, perhaps.

Maybe it's worth having separate wait event type names for lwlocks and
lwlock tranches.  We could report LWLockNamed and LWLockTranche and
document the difference: "LWLockNamed indicates that the backend is
waiting for a specific, named LWLock.  The event name is the name of
that lock.  LWLockTranche indicates that the backend is waiting for
any one of a group of locks with similar function.  The event name
identifies the general purpose of locks in that group."

There's no requirement that every session have every tranche
registered.  I think we should consider displaying "extension" for any
tranche that's not built-in, or at least for tranches that are not
registered (rather than "unknown wait event").

+   if (lock->tranche == 0 && lockId < NUM_INDIVIDUAL_LWLOCKS)

Isn't the second part automatically true at this point?

The changes to LockBufferForCleanup() don't look right to me.  Waiting
for a buffer pin might be a reasonable thing to define as a wait
event, but it shouldn't reported as if we were waiting on the LWLock
itself.

What happens if an error is thrown while we're in a wait?

Does this patch hurt performance?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-21 Thread Amit Kapila
On Wed, Feb 3, 2016 at 8:59 AM, Robert Haas  wrote:
>
> On Tue, Feb 2, 2016 at 10:27 PM, Amit Kapila 
wrote:
> > So, let's leave adding any additional column, but Alexander has brought
up
> > a good point about storing the wait_type and actual wait_event
> > information into four bytes.  Currently I have stored wait_type (aka
> > classId)
> > in first byte and then two bytes for wait_event (eventId) and remaining
> > one-byte can be used in future if required, however Alexandar is
proposing
> > to
> > combine both these (classId and eventId) into two-bytes which sounds
> > reasonable to me apart from the fact that it might add operation or two
> > extra
> > in this path.  Do you or anyone else have any preference over this
point?
>
> I wouldn't bother tinkering with it at this point.  The value isn't
> going to be recorded on disk anywhere, so it will be easy to change
> the way it's computed in the future if we ever need to do that.
>

Okay. Find the rebased patch attached with this mail.  I have moved
this patch to upcoming CF.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


extend_pg_stat_activity_v10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-02 Thread Robert Haas
On Mon, Feb 1, 2016 at 8:40 AM, Alexander Korotkov  wrote:
> I wonder if we can use 4-byte wait_event_info more efficiently.
> LWLock number in the tranche would be also useful information to expose.
> Using lwlock number user can determine if there is high concurrency for
> single lwlock in tranche or it is spread over multiple lwlocks.
> I think it would be enough to have 6 bits for event class id and 10 bit for
> event id. So, it would be maximum 64 event classes and maximum 1024 events
> per class. These limits seem to be fair enough for me.
> And then we save 16 bits for lock number. It's certainly not enough for some
> tranches. For instance, number of buffers could be easily more than 2^16.
> However, we could expose at least lower 16 bits. It would be at least
> something. Using this information user at least can make a conclusion like
> "It MIGHT be a high concurrency for single buffer content. Other way it is
> coincidence that a lot of different buffers have the same 16 lower bits.".
>
> Any thoughts?

Meh.  I think that's trying to be too clever.  That seems hard to
document, hard to explain, and likely incomprehensible to users.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-02 Thread Amit Kapila
On Tue, Feb 2, 2016 at 10:09 PM, Robert Haas  wrote:
>
> On Mon, Feb 1, 2016 at 8:40 AM, Alexander Korotkov 
wrote:
> > I wonder if we can use 4-byte wait_event_info more efficiently.
> > LWLock number in the tranche would be also useful information to expose.
> > Using lwlock number user can determine if there is high concurrency for
> > single lwlock in tranche or it is spread over multiple lwlocks.
> > I think it would be enough to have 6 bits for event class id and 10 bit
for
> > event id. So, it would be maximum 64 event classes and maximum 1024
events
> > per class. These limits seem to be fair enough for me.
> > And then we save 16 bits for lock number. It's certainly not enough for
some
> > tranches. For instance, number of buffers could be easily more than
2^16.
> > However, we could expose at least lower 16 bits. It would be at least
> > something. Using this information user at least can make a conclusion
like
> > "It MIGHT be a high concurrency for single buffer content. Other way it
is
> > coincidence that a lot of different buffers have the same 16 lower
bits.".
> >
> > Any thoughts?
>
> Meh.  I think that's trying to be too clever.  That seems hard to
> document, hard to explain, and likely incomprehensible to users.
>

So, let's leave adding any additional column, but Alexander has brought up
a good point about storing the wait_type and actual wait_event
information into four bytes.  Currently I have stored wait_type (aka
classId)
in first byte and then two bytes for wait_event (eventId) and remaining
one-byte can be used in future if required, however Alexandar is proposing
to
combine both these (classId and eventId) into two-bytes which sounds
reasonable to me apart from the fact that it might add operation or two
extra
in this path.  Do you or anyone else have any preference over this point?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-02 Thread Robert Haas
On Tue, Feb 2, 2016 at 10:27 PM, Amit Kapila  wrote:
> So, let's leave adding any additional column, but Alexander has brought up
> a good point about storing the wait_type and actual wait_event
> information into four bytes.  Currently I have stored wait_type (aka
> classId)
> in first byte and then two bytes for wait_event (eventId) and remaining
> one-byte can be used in future if required, however Alexandar is proposing
> to
> combine both these (classId and eventId) into two-bytes which sounds
> reasonable to me apart from the fact that it might add operation or two
> extra
> in this path.  Do you or anyone else have any preference over this point?

I wouldn't bother tinkering with it at this point.  The value isn't
going to be recorded on disk anywhere, so it will be easy to change
the way it's computed in the future if we ever need to do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-02 Thread Amit Kapila
On Mon, Feb 1, 2016 at 7:10 PM, Alexander Korotkov 
wrote:

> On Sun, Jan 31, 2016 at 6:55 AM, Amit Kapila 
> wrote:
>
>> On Thu, Jan 28, 2016 at 9:16 AM, Amit Kapila 
>> wrote:
>> >
>> > On Thu, Jan 28, 2016 at 2:12 AM, Robert Haas 
>> wrote:
>> > >
>> > > On Tue, Jan 26, 2016 at 3:10 AM, and...@anarazel.de <
>> and...@anarazel.de> wrote:
>> > > > I do think there's a considerable benefit in improving the
>> > > > instrumentation here, but his strikes me as making live more
>> complex for
>> > > > more users than it makes it easier. At the very least this should be
>> > > > split into two fields (type & what we're actually waiting on). I
>> also
>> > > > strongly suspect we shouldn't use in band signaling ("process not
>> > > > waiting"), but rather make the field NULL if we're not waiting on
>> > > > anything.
>> > >
>> > > +1 for splitting it into two fields.
>> > >
>> >
>> > I will take care of this.
>> >
>>
>> As discussed, I have added a new field wait_event_type along with
>> wait_event in pg_stat_activity.  Changed the code return NULL, if
>> backend is not waiting.  Updated the docs as well.
>>
>
> I wonder if we can use 4-byte wait_event_info more efficiently.
> LWLock number in the tranche would be also useful information to expose.
> Using lwlock number user can determine if there is high concurrency for
> single lwlock in tranche or it is spread over multiple lwlocks.
>

So what you are suggesting is that have an additional information
like sub_wait_event and it will display lock number for the LWLocks
which belong to tranche and otherwise it will be NULL for HWLocks
or Individual LWLocks.  I see the value in have one more field, but
just displaying some number and that too is not exact in some cases
like where LWLocks are more doesn't sound to be very informative.

Any body else have an opinion on this matter?


> I think it would be enough to have 6 bits for event class id and 10 bit
> for event id. So, it would be maximum 64 event classes and maximum 1024
> events per class. These limits seem to be fair enough for me.
>

I also think those are fair limits, let me try to shrink those into
number of bits suggested by you.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-01 Thread Alexander Korotkov
On Sun, Jan 31, 2016 at 6:55 AM, Amit Kapila 
wrote:

> On Thu, Jan 28, 2016 at 9:16 AM, Amit Kapila 
> wrote:
> >
> > On Thu, Jan 28, 2016 at 2:12 AM, Robert Haas 
> wrote:
> > >
> > > On Tue, Jan 26, 2016 at 3:10 AM, and...@anarazel.de <
> and...@anarazel.de> wrote:
> > > > I do think there's a considerable benefit in improving the
> > > > instrumentation here, but his strikes me as making live more complex
> for
> > > > more users than it makes it easier. At the very least this should be
> > > > split into two fields (type & what we're actually waiting on). I also
> > > > strongly suspect we shouldn't use in band signaling ("process not
> > > > waiting"), but rather make the field NULL if we're not waiting on
> > > > anything.
> > >
> > > +1 for splitting it into two fields.
> > >
> >
> > I will take care of this.
> >
>
> As discussed, I have added a new field wait_event_type along with
> wait_event in pg_stat_activity.  Changed the code return NULL, if
> backend is not waiting.  Updated the docs as well.
>

I wonder if we can use 4-byte wait_event_info more efficiently.
LWLock number in the tranche would be also useful information to expose.
Using lwlock number user can determine if there is high concurrency for
single lwlock in tranche or it is spread over multiple lwlocks.
I think it would be enough to have 6 bits for event class id and 10 bit for
event id. So, it would be maximum 64 event classes and maximum 1024 events
per class. These limits seem to be fair enough for me.
And then we save 16 bits for lock number. It's certainly not enough for
some tranches. For instance, number of buffers could be easily more than
2^16. However, we could expose at least lower 16 bits. It would be at least
something. Using this information user at least can make a conclusion like
"It MIGHT be a high concurrency for single buffer content. Other way it is
coincidence that a lot of different buffers have the same 16 lower bits.".

Any thoughts?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-01-30 Thread Amit Kapila
On Thu, Jan 28, 2016 at 9:16 AM, Amit Kapila 
wrote:
>
> On Thu, Jan 28, 2016 at 2:12 AM, Robert Haas 
wrote:
> >
> > On Tue, Jan 26, 2016 at 3:10 AM, and...@anarazel.de 
wrote:
> > > I do think there's a considerable benefit in improving the
> > > instrumentation here, but his strikes me as making live more complex
for
> > > more users than it makes it easier. At the very least this should be
> > > split into two fields (type & what we're actually waiting on). I also
> > > strongly suspect we shouldn't use in band signaling ("process not
> > > waiting"), but rather make the field NULL if we're not waiting on
> > > anything.
> >
> > +1 for splitting it into two fields.
> >
>
> I will take care of this.
>

As discussed, I have added a new field wait_event_type along with
wait_event in pg_stat_activity.  Changed the code return NULL, if
backend is not waiting.  Updated the docs as well.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


extend_pg_stat_activity_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-01-27 Thread Amit Kapila
On Thu, Jan 28, 2016 at 2:12 AM, Robert Haas  wrote:
>
> On Tue, Jan 26, 2016 at 3:10 AM, and...@anarazel.de 
wrote:
> > I do think there's a considerable benefit in improving the
> > instrumentation here, but his strikes me as making live more complex for
> > more users than it makes it easier. At the very least this should be
> > split into two fields (type & what we're actually waiting on). I also
> > strongly suspect we shouldn't use in band signaling ("process not
> > waiting"), but rather make the field NULL if we're not waiting on
> > anything.
>
> +1 for splitting it into two fields.
>

I will take care of this.

>
> Regarding making the field NULL, someone (I think you) proposed
> previously that we should have one field indicating whether we are
> waiting, and a separate field (or two) indicating the current or most
> recent wait event.
>

I think to do that way we need to change the meaning of pg_stat_activity.
waiting from waiting on locks to waiting on HWLocks and LWLocks and
other events which we add in future.  That can be again other source of
confusion where if existing users of pg_stat_activity.waiting are still
relying on it, they can get wrong information or if they are aware of the
recent change, then they need to add additional check like
(waiting = true && wait_event_type = 'HWLock')).  So I think it is better to
go with suggestion where we can display NULL in the new field when
there is no-wait.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-01-27 Thread Robert Haas
On Tue, Jan 26, 2016 at 3:10 AM, and...@anarazel.de  wrote:
> I do think there's a considerable benefit in improving the
> instrumentation here, but his strikes me as making live more complex for
> more users than it makes it easier. At the very least this should be
> split into two fields (type & what we're actually waiting on). I also
> strongly suspect we shouldn't use in band signaling ("process not
> waiting"), but rather make the field NULL if we're not waiting on
> anything.

+1 for splitting it into two fields.

Regarding making the field NULL, someone (I think you) proposed
previously that we should have one field indicating whether we are
waiting, and a separate field (or two) indicating the current or most
recent wait event.  That would be similar to how
pg_stat_activity.{query,state} work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-01-26 Thread Amit Kapila
On Tue, Jan 26, 2016 at 1:40 PM, and...@anarazel.de 
wrote:
>
> On 2016-01-26 13:22:09 +0530, Amit Kapila wrote:
> > @@ -633,9 +633,11 @@ postgres   27093  0.0  0.0  30096  2752 ?
 Ss   11:34   0:00 postgres: ser
> >   Time when the state was last
changed
> >  
> >  
> > - waiting
> > - boolean
> > - True if this backend is currently waiting on a lock
> > + wait_event
> > + text
> > + Wait event name if backend is currently waiting, otherwise
> > +  process not waiting
> > + 
> >  
> >  
>
> I still think this is a considerable regression in pg_stat_activity
> usability. There are lots of people out there that have queries that
> automatically monitor pg_stat_activity.waiting, and automatically go to
> pg_locks to understand what's going on, if there's one. With the above
> definition, that got much harder. Not only do I have to write
> WHERE wait_event <> 'process not waiting', but then also parse the wait
> event name, to know whether the process is waiting on a heavyweight
> lock, or something else!
>
> I do think there's a considerable benefit in improving the
> instrumentation here, but his strikes me as making live more complex for
> more users than it makes it easier.
>

Here, we have two ways to expose this functionality to user, first is
that we expose this new set of information (wait_type, wait_event)
separately either in new view or in pg_stat_activity and ask users
to migrate to this new information and mark pg_stat_activity.waiting as
deprecated and then remove it in future versions and second is remove
pg_stat_activity.waiting and expose new set of information which will
make users to forcibly move to this new set of information.  I think both
the ways have it's pros and cons and they are discussed upthread and
based on that I have decided to move forward with second way.

> At the very least this should be
> split into two fields (type & what we're actually waiting on).
>

makes sense to me, so we can repersent wait_type as:
wait_type text, values can be Lock (or HWLock), LWLock, Network, etc.

Let me know if that is okay or you have something else in mind?


> I also
> strongly suspect we shouldn't use in band signaling ("process not
> waiting"), but rather make the field NULL if we're not waiting on
> anything.
>

Agree, will change in next version of patch.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-01-26 Thread and...@anarazel.de
On 2016-01-26 13:22:09 +0530, Amit Kapila wrote:
> @@ -633,9 +633,11 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
> 11:34   0:00 postgres: ser
>   Time when the state was last changed
>  
>  
> - waiting
> - boolean
> - True if this backend is currently waiting on a lock
> + wait_event
> + text
> + Wait event name if backend is currently waiting, otherwise
> +  process not waiting
> + 
>  
>  

I still think this is a considerable regression in pg_stat_activity
usability. There are lots of people out there that have queries that
automatically monitor pg_stat_activity.waiting, and automatically go to
pg_locks to understand what's going on, if there's one. With the above
definition, that got much harder. Not only do I have to write
WHERE wait_event <> 'process not waiting', but then also parse the wait
event name, to know whether the process is waiting on a heavyweight
lock, or something else!

I do think there's a considerable benefit in improving the
instrumentation here, but his strikes me as making live more complex for
more users than it makes it easier. At the very least this should be
split into two fields (type & what we're actually waiting on). I also
strongly suspect we shouldn't use in band signaling ("process not
waiting"), but rather make the field NULL if we're not waiting on
anything.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-01-25 Thread Amit Kapila
On Wed, Jan 20, 2016 at 6:39 PM, Robert Haas  wrote:
>
> On Tue, Jan 19, 2016 at 11:49 PM, Amit Kapila 
wrote:
> >> On the topic of the UI, I understand that redefining
> >> pg_stat_activity.waiting might cause some short-term annoyance.  But I
> >> think in the long term what we are proposing here is going to be a
> >> huge improvement, so I think it's worth the compatibility break.  If
> >> we say that pg_stat_activity.waiting has to continue meaning "waiting
> >> for a heavyweight lock" even though we now also expose (in some other
> >> location) information on other kinds of waits, that's going to be
> >> confusing to users.
> >
> > If we want to go via this route, then the first thing which we need to
> > decide is whether we want to start displaying the information of
> > background processes like WALWriter and others in pg_stat_activity?
>
> That doesn't seem like a particularly good fit - few of the fields are
> relevant to that case.  We could provide some other way of getting at
> the information for background processes if people want, but
> personally I'd probably be inclined not to bother with it for right
> now.
>

I have updated the patch accordingly.  pg_stat_get_activity.waiting is
changed to a text column wait_event and currently it will display the
heavy-weight and light-weight lock information for backends, certainly
it can be extended to report network wait or disk wait events, but I feel
that can be done as an add-on patch.  For LWLocks, it returns LWLock
name for individual locks and tranche name for others.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


extend_pg_stat_activity_v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-01-20 Thread Robert Haas
On Tue, Jan 19, 2016 at 11:49 PM, Amit Kapila  wrote:
>> My opinion is that storing the information in PGPROC is better because
>> it seems like we can fairly painlessly expose 4 bytes of data that way
>> instead of 1, which is nice.
>
> Okay, do you mean to say that we can place this new 4-byte variable
> in PGPROC at 4-byte aligned boundary, so both read and writes will be
> atomic?

Yes.  However, note that you don't need to do anything special to get
it 4-byte aligned.  The compiler will do that automatically.

>> On the topic of the UI, I understand that redefining
>> pg_stat_activity.waiting might cause some short-term annoyance.  But I
>> think in the long term what we are proposing here is going to be a
>> huge improvement, so I think it's worth the compatibility break.  If
>> we say that pg_stat_activity.waiting has to continue meaning "waiting
>> for a heavyweight lock" even though we now also expose (in some other
>> location) information on other kinds of waits, that's going to be
>> confusing to users.
>
> If we want to go via this route, then the first thing which we need to
> decide is whether we want to start displaying the information of
> background processes like WALWriter and others in pg_stat_activity?

That doesn't seem like a particularly good fit - few of the fields are
relevant to that case.  We could provide some other way of getting at
the information for background processes if people want, but
personally I'd probably be inclined not to bother with it for right
now.

>>  It's better to force people to update their
>> queries once than to have this confusing wart in the system forever.
>> I predict that if we make backward compatibility the priority here,
>> we'll still be explaining it to smart but confused people when 9.6
>> goes EOL.
>
> Valid point, OTOH we can update the docs to say that
> pg_stat_activity.waiting parameter is deprecated and after a
> release or two we can get rid of this parameter.

My impression is that doesn't really ease the pain much.  Half the
time we never actually remove the deprecated column, or much later
than predicted, and people don't read the docs and keep relying on it
anyway.  So in the end it just makes the process more complicated
without really helping.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-01-19 Thread Robert Haas
On Mon, Jan 18, 2016 at 10:41 PM, Amit Kapila  wrote:
> Initially, we started with extending the 'waiting' column in
> pg_stat_activity,
> to which some people have raised concerns about backward
> compatability, so another option that came-up during discussion was to
> retain waiting as it-is and have an additional column 'wait_event' in
> pg_stat_activity, after that there is feedback that we should try to include
> wait information about background processes as well which raises a bigger
> question whether it is any good to expose this information via
> pg_stat_activity
> (pg_stat_activity doesn't display information about background processes)
> or is it better to have a new view as discussed here [1].
>
> Second important and somewhat related point is whether we should save
> this information in PGPROC as 4 bytes or keep it in pgBackendStatus.
> I think it is better to store in PGPROC, if we want to save wait information
> for backend processes as well.
>
> I am of opinion that we should save this information in PGPROC and
> expose it via new view, but I am open to go other ways based on what
> others think about this matter.

My opinion is that storing the information in PGPROC is better because
it seems like we can fairly painlessly expose 4 bytes of data that way
instead of 1, which is nice.

On the topic of the UI, I understand that redefining
pg_stat_activity.waiting might cause some short-term annoyance.  But I
think in the long term what we are proposing here is going to be a
huge improvement, so I think it's worth the compatibility break.  If
we say that pg_stat_activity.waiting has to continue meaning "waiting
for a heavyweight lock" even though we now also expose (in some other
location) information on other kinds of waits, that's going to be
confusing to users.  It's better to force people to update their
queries once than to have this confusing wart in the system forever.
I predict that if we make backward compatibility the priority here,
we'll still be explaining it to smart but confused people when 9.6
goes EOL.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-01-19 Thread Amit Kapila
On Wed, Jan 20, 2016 at 12:41 AM, Robert Haas  wrote:
>
> On Mon, Jan 18, 2016 at 10:41 PM, Amit Kapila 
wrote:
> > Second important and somewhat related point is whether we should save
> > this information in PGPROC as 4 bytes or keep it in pgBackendStatus.
> > I think it is better to store in PGPROC, if we want to save wait
information
> > for backend processes as well.
> >
> > I am of opinion that we should save this information in PGPROC and
> > expose it via new view, but I am open to go other ways based on what
> > others think about this matter.
>
> My opinion is that storing the information in PGPROC is better because
> it seems like we can fairly painlessly expose 4 bytes of data that way
> instead of 1, which is nice.
>

Okay, do you mean to say that we can place this new 4-byte variable
in PGPROC at 4-byte aligned boundary, so both read and writes will be
atomic?

> On the topic of the UI, I understand that redefining
> pg_stat_activity.waiting might cause some short-term annoyance.  But I
> think in the long term what we are proposing here is going to be a
> huge improvement, so I think it's worth the compatibility break.  If
> we say that pg_stat_activity.waiting has to continue meaning "waiting
> for a heavyweight lock" even though we now also expose (in some other
> location) information on other kinds of waits, that's going to be
> confusing to users.
>

If we want to go via this route, then the first thing which we need to
decide is whether we want to start displaying the information of
background processes like WALWriter and others in pg_stat_activity?
Second thing that needs some thoughts is that functions like
pg_stat_get_activity() needs to rely both on PgBackendStatus and
PGProc and we might also need to do some special handling for
background processes if want the information for those processes
in this view.

>  It's better to force people to update their
> queries once than to have this confusing wart in the system forever.
> I predict that if we make backward compatibility the priority here,
> we'll still be explaining it to smart but confused people when 9.6
> goes EOL.
>

Valid point, OTOH we can update the docs to say that
pg_stat_activity.waiting parameter is deprecated and after a
release or two we can get rid of this parameter.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-01-18 Thread Robert Haas
On Mon, Jan 18, 2016 at 11:09 AM, Alvaro Herrera
 wrote:
> Amit Kapila wrote:
>
>> The reason for not updating the patch related to this thread is that it is
>> dependent on the work for refactoring the tranches for LWLocks [1]
>> which is now coming towards an end, so I think it is quite reasonable
>> that the patch can be updated for this work during commit fest, so
>> I am moving it to upcoming CF.
>
> Thanks.  I think the tranche reworks are mostly done now, so is anyone
> submitting an updated version of this patch?
>
> Also, it would be very good if someone can provide insight on how this
> patch interacts with the other submitted patch for "waiting for
> replication" https://commitfest.postgresql.org/8/436/
> Andres seems to think that the other patch is completely independent of
> this one, i.e. the "waiting for replication" column needs to exist
> separately and not as part of the "more descriptive" new 'waiting'
> column.

Yeah, I really don't agree with that.  I think that it's much better
to have one column that says what you are waiting for than a bunch of
separate columns that tell you whether you are waiting for individual
things for which you might be waiting.  I think this patch, which
introduces the general mechanism, should win: and the other patch
should then be one client of that mechanism.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-01-18 Thread Alvaro Herrera
Amit Kapila wrote:

> The reason for not updating the patch related to this thread is that it is
> dependent on the work for refactoring the tranches for LWLocks [1]
> which is now coming towards an end, so I think it is quite reasonable
> that the patch can be updated for this work during commit fest, so
> I am moving it to upcoming CF.

Thanks.  I think the tranche reworks are mostly done now, so is anyone
submitting an updated version of this patch?

Also, it would be very good if someone can provide insight on how this
patch interacts with the other submitted patch for "waiting for
replication" https://commitfest.postgresql.org/8/436/
Andres seems to think that the other patch is completely independent of
this one, i.e. the "waiting for replication" column needs to exist
separately and not as part of the "more descriptive" new 'waiting'
column.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-01-18 Thread Amit Kapila
On Mon, Jan 18, 2016 at 11:06 PM, Robert Haas  wrote:
>
> On Mon, Jan 18, 2016 at 11:09 AM, Alvaro Herrera
>  wrote:
> > Amit Kapila wrote:
> >
> >> The reason for not updating the patch related to this thread is that
it is
> >> dependent on the work for refactoring the tranches for LWLocks [1]
> >> which is now coming towards an end, so I think it is quite reasonable
> >> that the patch can be updated for this work during commit fest, so
> >> I am moving it to upcoming CF.
> >
> > Thanks.  I think the tranche reworks are mostly done now, so is anyone
> > submitting an updated version of this patch?
> >

Before updating the patch, it is better to clarify few points as mentioned
below.

>
> > Also, it would be very good if someone can provide insight on how this
> > patch interacts with the other submitted patch for "waiting for
> > replication" https://commitfest.postgresql.org/8/436/
> > Andres seems to think that the other patch is completely independent of
> > this one, i.e. the "waiting for replication" column needs to exist
> > separately and not as part of the "more descriptive" new 'waiting'
> > column.
>
> Yeah, I really don't agree with that.  I think that it's much better
> to have one column that says what you are waiting for than a bunch of
> separate columns that tell you whether you are waiting for individual
> things for which you might be waiting.  I think this patch, which
> introduces the general mechanism, should win: and the other patch
> should then be one client of that mechanism.
>

I agree with what you have said, but I think here bigger question is about
the UI and which is the more appropriate place to store wait information. I
will try to summarize the options discussed.

Initially, we started with extending the 'waiting' column in
pg_stat_activity,
to which some people have raised concerns about backward
compatability, so another option that came-up during discussion was to
retain waiting as it-is and have an additional column 'wait_event' in
pg_stat_activity, after that there is feedback that we should try to include
wait information about background processes as well which raises a bigger
question whether it is any good to expose this information via
pg_stat_activity
(pg_stat_activity doesn't display information about background processes)
or is it better to have a new view as discussed here [1].

Second important and somewhat related point is whether we should save
this information in PGPROC as 4 bytes or keep it in pgBackendStatus.
I think it is better to store in PGPROC, if we want to save wait information
for backend processes as well.

I am of opinion that we should save this information in PGPROC and
expose it via new view, but I am open to go other ways based on what
others think about this matter.

[1] -
http://www.postgresql.org/message-id/CAA4eK1+=5Ex8-5NRr3u94=_t2p65v0kcjZ5rXddVmkx=lwa...@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-12-31 Thread Amit Kapila
On Thu, Dec 24, 2015 at 8:02 AM, Michael Paquier 
wrote:
>
> On Tue, Nov 17, 2015 at 8:36 PM, Vladimir Borodin 
wrote:
> >
> > 14 нояб. 2015 г., в 10:50, Amit Kapila 
написал(а):
> >
> > On Wed, Sep 16, 2015 at 11:22 PM, Robert Haas 
wrote:
> >> On Wed, Sep 16, 2015 at 12:29 PM, Alexander Korotkov
> >>  wrote:
> >
> > One thing that occurred to me in this context is that if we store the
wait
> > event information in PGPROC, then can we think of providing the info
> > about wait events in a separate view pg_stat_waits (or
pg_stat_wait_info or
> > any other better name) where we can display wait information about
> > all-processes rather than only backends?  This will avoid the confusion
> > about breaking the backward compatibility for the current 'waiting'
column
> > in pg_stat_activity.
> >
> > pg_stat_waits can have columns:
> > pid  - Process Id
> > wait_class_name - Name of the wait class
> > wait class_event  - name of the wait event
> >
> > We can extend it later with the information about timing for wait event.
> >
> > Also, if we follow this approach, I think we don't need to store this
> > information in PgBackendStatus.
> >
> >
> > Sounds like exactly the same that was proposed by Ildus in this thead
[0].
> > Great to be thinking in the same direction. And on the rights of
> > advertisements I’ve somehow described using all those views here [1].
> >
> > [0] http://www.postgresql.org/message-id/559d4729.9080...@postgrespro.ru
> > [1] https://simply.name/pg-stat-wait.html
>
> This thread has stalled a bit and is waiting for new patches for some
> time now, hence I have switched it as "returned with feedback" on the
> CF app.
>

The reason for not updating the patch related to this thread is that it is
dependent on the work for refactoring the tranches for LWLocks [1]
which is now coming towards an end, so I think it is quite reasonable
that the patch can be updated for this work during commit fest, so
I am moving it to upcoming CF.

[1] -
http://www.postgresql.org/message-id/caa4ek1kjshwp0dfcq6chpnq_wjqwleapmfd4z6kusltqtuz...@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-12-23 Thread Michael Paquier
On Tue, Nov 17, 2015 at 8:36 PM, Vladimir Borodin  wrote:
>
> 14 нояб. 2015 г., в 10:50, Amit Kapila  написал(а):
>
> On Wed, Sep 16, 2015 at 11:22 PM, Robert Haas  wrote:
>> On Wed, Sep 16, 2015 at 12:29 PM, Alexander Korotkov
>>  wrote:
>>
>> >> I think it's reasonable to consider reporting this data in the PGPROC
>> >> using a 4-byte integer rather than reporting it through a singe byte
>> >> in the backend status structure.  I believe that addresses the
>> >> concerns about reporting from auxiliary processes, and it also allows
>> >> a little more data to be reported.  For anything in excess of that, I
>> >> think we should think rather harder.  Most likely, such addition
>> >> detail should be reported only for certain types of wait events, or on
>> >> a delay, or something like that, so that the core mechanism remains
>> >> really, really fast.
>> >
>> > That sounds reasonable. There are many pending questions, but it seems
>> > like
>> > step forward to me.
>>
>> Great, let's do it.  I think we should probably do the work to
>> separate the non-individual lwlocks into tranches first, though.
>>
>
> One thing that occurred to me in this context is that if we store the wait
> event information in PGPROC, then can we think of providing the info
> about wait events in a separate view pg_stat_waits (or pg_stat_wait_info or
> any other better name) where we can display wait information about
> all-processes rather than only backends?  This will avoid the confusion
> about breaking the backward compatibility for the current 'waiting' column
> in pg_stat_activity.
>
> pg_stat_waits can have columns:
> pid  - Process Id
> wait_class_name - Name of the wait class
> wait class_event  - name of the wait event
>
> We can extend it later with the information about timing for wait event.
>
> Also, if we follow this approach, I think we don't need to store this
> information in PgBackendStatus.
>
>
> Sounds like exactly the same that was proposed by Ildus in this thead [0].
> Great to be thinking in the same direction. And on the rights of
> advertisements I’ve somehow described using all those views here [1].
>
> [0] http://www.postgresql.org/message-id/559d4729.9080...@postgrespro.ru
> [1] https://simply.name/pg-stat-wait.html

This thread has stalled a bit and is waiting for new patches for some
time now, hence I have switched it as "returned with feedback" on the
CF app.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-11-17 Thread Vladimir Borodin

> 14 нояб. 2015 г., в 10:50, Amit Kapila  написал(а):
> 
> On Wed, Sep 16, 2015 at 11:22 PM, Robert Haas  > wrote:
> > On Wed, Sep 16, 2015 at 12:29 PM, Alexander Korotkov
> > > wrote:
> >
> > >> I think it's reasonable to consider reporting this data in the PGPROC
> > >> using a 4-byte integer rather than reporting it through a singe byte
> > >> in the backend status structure.  I believe that addresses the
> > >> concerns about reporting from auxiliary processes, and it also allows
> > >> a little more data to be reported.  For anything in excess of that, I
> > >> think we should think rather harder.  Most likely, such addition
> > >> detail should be reported only for certain types of wait events, or on
> > >> a delay, or something like that, so that the core mechanism remains
> > >> really, really fast.
> > >
> > > That sounds reasonable. There are many pending questions, but it seems 
> > > like
> > > step forward to me.
> >
> > Great, let's do it.  I think we should probably do the work to
> > separate the non-individual lwlocks into tranches first, though.
> >
> 
> One thing that occurred to me in this context is that if we store the wait
> event information in PGPROC, then can we think of providing the info
> about wait events in a separate view pg_stat_waits (or pg_stat_wait_info or
> any other better name) where we can display wait information about
> all-processes rather than only backends?  This will avoid the confusion
> about breaking the backward compatibility for the current 'waiting' column
> in pg_stat_activity.
> 
> pg_stat_waits can have columns:
> pid  - Process Id
> wait_class_name - Name of the wait class
> wait class_event  - name of the wait event
> 
> We can extend it later with the information about timing for wait event.
> 
> Also, if we follow this approach, I think we don't need to store this
> information in PgBackendStatus.

Sounds like exactly the same that was proposed by Ildus in this thead [0]. 
Great to be thinking in the same direction. And on the rights of advertisements 
I’ve somehow described using all those views here [1].

[0] http://www.postgresql.org/message-id/559d4729.9080...@postgrespro.ru
[1] https://simply.name/pg-stat-wait.html

>   
> 
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com 

--
May the force be with you…
https://simply.name



Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-11-14 Thread Craig Ringer
On 14 November 2015 at 15:50, Amit Kapila  wrote:


> One thing that occurred to me in this context is that if we store the wait
> event information in PGPROC, then can we think of providing the info
> about wait events in a separate view pg_stat_waits (or pg_stat_wait_info or
> any other better name) where we can display wait information about
> all-processes rather than only backends?
>

Sounds good to me. Consider a logical decoding plugin in a walsender for
example.


> This will avoid the confusion
> about breaking the backward compatibility for the current 'waiting' column
> in pg_stat_activity.
>

I'm about -10^7 on changing the 'waiting' column. I am still seeing
confused users from the 'procpid' to 'pid' renaming. If more info is
needed, add a column with more detail or, as you suggest here, use a new
view.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-11-13 Thread Amit Kapila
On Wed, Sep 16, 2015 at 11:22 PM, Robert Haas  wrote:
> On Wed, Sep 16, 2015 at 12:29 PM, Alexander Korotkov
>  wrote:
>
> >> I think it's reasonable to consider reporting this data in the PGPROC
> >> using a 4-byte integer rather than reporting it through a singe byte
> >> in the backend status structure.  I believe that addresses the
> >> concerns about reporting from auxiliary processes, and it also allows
> >> a little more data to be reported.  For anything in excess of that, I
> >> think we should think rather harder.  Most likely, such addition
> >> detail should be reported only for certain types of wait events, or on
> >> a delay, or something like that, so that the core mechanism remains
> >> really, really fast.
> >
> > That sounds reasonable. There are many pending questions, but it seems
like
> > step forward to me.
>
> Great, let's do it.  I think we should probably do the work to
> separate the non-individual lwlocks into tranches first, though.
>

One thing that occurred to me in this context is that if we store the wait
event information in PGPROC, then can we think of providing the info
about wait events in a separate view pg_stat_waits (or pg_stat_wait_info or
any other better name) where we can display wait information about
all-processes rather than only backends?  This will avoid the confusion
about breaking the backward compatibility for the current 'waiting' column
in pg_stat_activity.

pg_stat_waits can have columns:
pid  - Process Id
wait_class_name - Name of the wait class
wait class_event  - name of the wait event

We can extend it later with the information about timing for wait event.

Also, if we follow this approach, I think we don't need to store this
information in PgBackendStatus.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-20 Thread Robert Haas
On Fri, Sep 18, 2015 at 4:43 PM, Vladimir Borodin  wrote:
> No, probably you misunderstood the results, let me explain one more time.

Yeah, I did, sorry.

> Unpatched PostgreSQL from REL9_4_STABLE gave 15500 tps. Version with timings
> - 14500 tps which is 6,5% worse. Version with sampling wait events every 10
> ms gave 13800 tps (11% worse than unpatched and 5% worse than with timings).

OK.

So, I don't really care about the timing stuff right now.  I want to
get the stuff without timings done first.  To do that, we need to come
to an agreement on how much information we're going to try to expose
and from where, and we need to make sure that doing that doesn't cause
a performance hit.  Then, if we want to add timing as an optional
feature for people who can tolerate the overhead, that's fine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-18 Thread Vladimir Borodin

> 18 сент. 2015 г., в 20:16, Robert Haas  написал(а):
> 
> On Fri, Sep 18, 2015 at 4:08 AM, Vladimir Borodin  wrote:
>> For both scenarios on linux we got approximately the same results - version
>> with timings was faster then version with sampling (sampling was done every
>> 10 ms). Vanilla PostgreSQL from REL9_4_STABLE gave ~15500 tps and version
>> with timings gave ~14500 tps while version with sampling gave ~13800 tps. In
>> all cases processor was 100% utilized. Comparing vanilla PostgreSQL and
>> version with timings on constant workload (12000 tps) gave the following
>> results in latencies for queries:
> 
> If the timing is speeding things up, that's most likely a sign that
> the spinlock contention on that workload is so severe that you are
> spending a lot of time in s_lock.  Adding more things for the system
> to do that don't require that lock will speed the system up by
> reducing the contention.  Instead of inserting gettimeofday() calls,
> you could insert a for loop that counts to some large number without
> doing any useful work, and that would likely have a similar effect.
> 
> In any case, I think your experiment clearly proves that the presence
> or absence of this instrumentation *is* performance-relevant and that
> we *do* need to worry about what it costs. If the system gets 20%
> faster when you call gettimeofday() a lot, does that mean we should
> insert gettimeofday() calls all over the system in random places to
> speed it up?

No, probably you misunderstood the results, let me explain one more time. 
Unpatched PostgreSQL from REL9_4_STABLE gave 15500 tps. Version with timings - 
14500 tps which is 6,5% worse. Version with sampling wait events every 10 ms 
gave 13800 tps (11% worse than unpatched and 5% worse than with timings).

We also made a test with a stable workload of 12000 tps for unpatched version 
and version with timings. In thas test we saw that response times are a bit 
worse in version with timings as shown in the table below. You should read this 
table as follows: 99% of all queries in unpatched version fits in 79 ms while 
in version with timings 99% of all queries fits in 97 ms which is 18 ms slower, 
and so on. That test also showed that version with timings consumes extra 7% of 
CPU to handle the same workload as unpatched version.

So this is the cost of waits monitoring with timings on lwlock stress workload 
- 6,5% less throughput, a bit worse timings and extra 7% of CPU. If you will 
insert gettimeofday() calls all over the system in random places, you 
expectedly will not speed up, you will be getting slower.

q'thvanilla timing
99% 79.097.0(+18.0)
98% 64.076.0(+12.0)
95% 38.047.0(+9.0)
90% 16.021.0(+5.0)
85% 7.0 11.0(+4.0)
80% 5.0 7.0 (+2.0)
75% 4.0 5.0 (+1.0)
50% 2.0 3.0 (+1.0)

> 
> I do agree that if we're going to include support for timings, having
> them be controlled by a GUC is a good idea.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


--
May the force be with you…
https://simply.name



Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-18 Thread Vladimir Borodin

> 16 сент. 2015 г., в 20:52, Robert Haas  написал(а):
> 
> On Wed, Sep 16, 2015 at 12:29 PM, Alexander Korotkov
>  wrote:
>> Yes, the major question is cost. But I think we should validate our thoughts
>> by experiments assuming there are more possible synchronization protocols.
>> Ildus posted implemention of double buffering approach that showed quite low
>> cost.
> 
> I'm not sure exactly which email you are referring to, but I don't
> believe that anyone has done experiments that are anywhere near
> comprehensive enough to convince ourselves that this won't be a
> problem.  If a particular benchmark doesn't show an issue, that can
> just mean that the benchmark isn't hitting the case where there is a
> problem.  For example, EDB has had customers who have severe
> contention apparently on the buffer content lwlocks, resulting in big
> slowdowns.  You don't see that in, say, a pgbench run.  But for people
> who have certain kinds of queries, it's really bad.  Those sort of
> loads, where the lwlock system really gets stressed, are cases where
> adding overhead seems likely to pinch.

Alexander and Ildus gave us two patches for REL9_4_STABLE - one with sampling 
every N milliseconds and one with measuring timings. We have tested both of 
them on two kinds of our workload besides pgbench runs. One workload is OLTP 
when all the data fits in shared buffers, synchronous_commit if off and the 
bottleneck is ProcArrayLock (all backtraces look something like the following):

[Thread debugging using libthread_db enabled]
0x7f10e01d4f27 in semop () from /lib64/libc.so.6
#0  0x7f10e01d4f27 in semop () from /lib64/libc.so.6
#1  0x0061fe27 in PGSemaphoreLock (sema=0x7f11f2d4f430, interruptOK=0 
'\000') at pg_sema.c:421
#2  0x006769ba in LWLockAcquireCommon (l=0x7f10e1e00120, 
mode=LW_EXCLUSIVE) at lwlock.c:626
#3  LWLockAcquire (l=0x7f10e1e00120, mode=LW_EXCLUSIVE) at lwlock.c:467
#4  0x00667862 in ProcArrayEndTransaction (proc=0x7f11f2d4f420, 
latestXid=182562881) at procarray.c:404
#5  0x004b579b in CommitTransaction () at xact.c:1957
#6  0x004b6ae5 in CommitTransactionCommand () at xact.c:2727
#7  0x006819d9 in finish_xact_command () at postgres.c:2437
#8  0x00684f05 in PostgresMain (argc=, argv=, dbname=0x21e1a70 "xivadb", username=) at 
postgres.c:4270
#9  0x00632d7d in BackendRun (argc=, argv=) at postmaster.c:4155
#10 BackendStartup (argc=, argv=) at 
postmaster.c:3829
#11 ServerLoop (argc=, argv=) at 
postmaster.c:1597
#12 PostmasterMain (argc=, argv=) at 
postmaster.c:1244
#13 0x005cadb8 in main (argc=3, argv=0x21e0aa0) at main.c:228


Another is when all the data fits in RAM but does not fit in shared buffers and 
the bottleneck is mostly BufFreelistLock with sensible contention around buffer 
partitions locking and buffers locking. Taking several backtraces with GDB of 
several backends and doing some bash magic gives the following:

root@pgload01g ~ # grep '^#4 ' /tmp/bt | awk '{print $2, $4, $NF}' | sort | 
uniq -c | sort -rn
126 0x0065db61 BufferAlloc bufmgr.c:591
 67 0x0065e03a BufferAlloc bufmgr.c:760
 43 0x005c8c3b pq_getbyte pqcomm.c:899
 39 0x0065dd93 BufferAlloc bufmgr.c:765
  6 0x004b52bb RecordTransactionCommit xact.c:1194
  4 0x0065da0e ReadBuffer_common bufmgr.c:476
  1 ReadBuffer_common relpersistence=112 bufmgr.c:340
  1 exec_eval_expr expr=0x166e908, pl_exec.c:4796
  1 0x7f78b8cb217b ?? /usr/pgsql-9.4/lib/pg_stat_statements.so
  1 0x005d4cbb _copyList copyfuncs.c:3849
root@pgload01g ~ #

For both scenarios on linux we got approximately the same results - version 
with timings was faster then version with sampling (sampling was done every 10 
ms). Vanilla PostgreSQL from REL9_4_STABLE gave ~15500 tps and version with 
timings gave ~14500 tps while version with sampling gave ~13800 tps. In all 
cases processor was 100% utilized. Comparing vanilla PostgreSQL and version 
with timings on constant workload (12000 tps) gave the following results in 
latencies for queries:

q'thvanilla timing
99% 79.097.0(+18.0)
98% 64.076.0(+12.0)
95% 38.047.0(+9.0)
90% 16.021.0(+5.0)
85% 7.0 11.0(+4.0)
80% 5.0 7.0 (+2.0)
75% 4.0 5.0 (+1.0)
50% 2.0 3.0 (+1.0)

And it that test version with timings consumed about 7% more of CPU. Does it 
seem to be the results on workload where lwlock system is stressed?

And when the data does not fit in RAM you really don’t see much difference 
between all three version because your contention is moved from lwlock system 
to I/O, even with newest NVMe SSDs, or at least is divided between lwlocks and 
other waits events.

> 
>> Yes, but some competing products also provides 

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-18 Thread Robert Haas
On Fri, Sep 18, 2015 at 4:08 AM, Vladimir Borodin  wrote:
> For both scenarios on linux we got approximately the same results - version
> with timings was faster then version with sampling (sampling was done every
> 10 ms). Vanilla PostgreSQL from REL9_4_STABLE gave ~15500 tps and version
> with timings gave ~14500 tps while version with sampling gave ~13800 tps. In
> all cases processor was 100% utilized. Comparing vanilla PostgreSQL and
> version with timings on constant workload (12000 tps) gave the following
> results in latencies for queries:

If the timing is speeding things up, that's most likely a sign that
the spinlock contention on that workload is so severe that you are
spending a lot of time in s_lock.  Adding more things for the system
to do that don't require that lock will speed the system up by
reducing the contention.  Instead of inserting gettimeofday() calls,
you could insert a for loop that counts to some large number without
doing any useful work, and that would likely have a similar effect.

In any case, I think your experiment clearly proves that the presence
or absence of this instrumentation *is* performance-relevant and that
we *do* need to worry about what it costs. If the system gets 20%
faster when you call gettimeofday() a lot, does that mean we should
insert gettimeofday() calls all over the system in random places to
speed it up?

I do agree that if we're going to include support for timings, having
them be controlled by a GUC is a good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-16 Thread Alexander Korotkov
On Mon, Sep 14, 2015 at 3:03 PM, Robert Haas  wrote:

> On Mon, Sep 14, 2015 at 5:32 AM, Alexander Korotkov
>  wrote:
> > In order to build the consensus we need the roadmap for waits monitoring.
> > Would single byte in PgBackendStatus be the only way for tracking wait
> > events? Could we have pluggable infrastructure in waits monitoring: for
> > instance, hooks for wait event begin and end?
>
> No, it's not the only way of doing it.  I proposed doing that way
> because it's simple and cheap, but I'm not hell-bent on it.  My basic
> concern here is about the cost of this.  I think that the most data we
> can report without some kind of synchronization protocol is one 4-byte
> integer.  If we want to report anything more than that, we're going to
> need something like the st_changecount protocol, or a lock, and that's
> going to add very significantly - and in my view unacceptably - to the
> cost.  I care very much about having this facility be something that
> we can use in lots of places, even extremely frequent operations like
> buffer reads and contended lwlock acquisition.
>

Yes, the major question is cost. But I think we should validate our
thoughts by experiments assuming there are more possible synchronization
protocols. Ildus posted implemention of double buffering approach that
showed quite low cost.

I think that there may be some *kinds of waits* for which it's
> practical to report additional detail.  For example, suppose that when
> a heavyweight lock wait first happens, we just report the lock type
> (relation, tuple, etc.) but then when the deadlock detector expires,
> if we're still waiting, we report the entire lock tag.  Well, that's
> going to happen infrequently enough, and is expensive enough anyway,
> that the cost doesn't matter.  But if, every time we read a disk
> block, we take a lock (or bump a changecount and do a write barrier),
> dump the whole block tag in there, release the lock (or do another
> write barrier and bump the changecount again) that sounds kind of
> expensive to me.  Maybe we can prove that it doesn't matter on any
> workload, but I doubt it.  We're fighting for every cycle in some of
> these code paths, and there's good evidence that we're burning too
> many of them compared to competing products already.
>

Yes, but some competing products also provides comprehensive waits
monitoring too. That makes me think it should be possible for us too.

I am not a big fan of hooks as a way of resolving disagreements about
> the design.  We may find that there are places where it's useful to
> have hooks so that different extensions can do different things, and
> that is fine.  But we shouldn't use that as a way of punting the
> difficult questions.  There isn't enough common understanding here of
> what we're all trying to get done and why we're trying to do it in
> particular ways rather than in other ways to jump to the conclusion
> that a hook is the right answer.  I'd prefer to have a nice, built-in
> system that everyone agrees represents a good set of trade-offs than
> an extensible system.
>

I think the reason for hooks could be not only disagreements about design,
but platform dependent issues too.
Next step after we have view with current wait events will be gathering
some statistics of them. We can oppose at least two approaches here:
1) Periodical sampling of current wait events.
2) Measure each wait event duration. We could collect statistics for short
period locally and update shared memory structure periodically (using some
synchronization protocol).

In the previous attempt to gather lwlocks statistics, you predict that
sampling could have a significant overhead [1]. In contrast, on many
systems time measurements are cheap. We have implemented both approaches
and it shows that sampling every 1 milliseconds produce higher overhead
than individual duration measurements for each wait event. We can share
another version of waits monitoring based on sampling to make these results
reproducible for everybody. However, cheap time measurements are available
not for each platform. For instance, ISTM that on Windows time measurements
are too expensive [2].

That makes me think that we need pluggable solution, at least for
statistics: direct measuring of events durations for majority of systems
and sampling for others as the least harm.

I think it's reasonable to consider reporting this data in the PGPROC
> using a 4-byte integer rather than reporting it through a singe byte
> in the backend status structure.  I believe that addresses the
> concerns about reporting from auxiliary processes, and it also allows
> a little more data to be reported.  For anything in excess of that, I
> think we should think rather harder.  Most likely, such addition
> detail should be reported only for certain types of wait events, or on
> a delay, or something like that, so that the core mechanism remains
> really, really fast.


That 

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-16 Thread Robert Haas
On Wed, Sep 16, 2015 at 12:29 PM, Alexander Korotkov
 wrote:
> Yes, the major question is cost. But I think we should validate our thoughts
> by experiments assuming there are more possible synchronization protocols.
> Ildus posted implemention of double buffering approach that showed quite low
> cost.

I'm not sure exactly which email you are referring to, but I don't
believe that anyone has done experiments that are anywhere near
comprehensive enough to convince ourselves that this won't be a
problem.  If a particular benchmark doesn't show an issue, that can
just mean that the benchmark isn't hitting the case where there is a
problem.  For example, EDB has had customers who have severe
contention apparently on the buffer content lwlocks, resulting in big
slowdowns.  You don't see that in, say, a pgbench run.  But for people
who have certain kinds of queries, it's really bad.  Those sort of
loads, where the lwlock system really gets stressed, are cases where
adding overhead seems likely to pinch.

> Yes, but some competing products also provides comprehensive waits
> monitoring too. That makes me think it should be possible for us too.

I agree, but keep in mind that some of those products may use
techniques to reduce the overhead that we don't have available.  I
have a strong suspicion that one of those products in particular has
done something clever to make measuring the time cheap on all
platforms.  Whatever that clever thing is, we haven't done it.  So
that matters.

> I think the reason for hooks could be not only disagreements about design,
> but platform dependent issues too.
> Next step after we have view with current wait events will be gathering some
> statistics of them. We can oppose at least two approaches here:
> 1) Periodical sampling of current wait events.
> 2) Measure each wait event duration. We could collect statistics for short
> period locally and update shared memory structure periodically (using some
> synchronization protocol).
>
> In the previous attempt to gather lwlocks statistics, you predict that
> sampling could have a significant overhead [1]. In contrast, on many systems
> time measurements are cheap. We have implemented both approaches and it
> shows that sampling every 1 milliseconds produce higher overhead than
> individual duration measurements for each wait event. We can share another
> version of waits monitoring based on sampling to make these results
> reproducible for everybody. However, cheap time measurements are available
> not for each platform. For instance, ISTM that on Windows time measurements
> are too expensive [2].
>
> That makes me think that we need pluggable solution, at least for
> statistics: direct measuring of events durations for majority of systems and
> sampling for others as the least harm.

To me, those seem like arguments for making it configurable, but not
necessarily for having hooks.

>> I think it's reasonable to consider reporting this data in the PGPROC
>> using a 4-byte integer rather than reporting it through a singe byte
>> in the backend status structure.  I believe that addresses the
>> concerns about reporting from auxiliary processes, and it also allows
>> a little more data to be reported.  For anything in excess of that, I
>> think we should think rather harder.  Most likely, such addition
>> detail should be reported only for certain types of wait events, or on
>> a delay, or something like that, so that the core mechanism remains
>> really, really fast.
>
> That sounds reasonable. There are many pending questions, but it seems like
> step forward to me.

Great, let's do it.  I think we should probably do the work to
separate the non-individual lwlocks into tranches first, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-14 Thread Amit Kapila
On Mon, Sep 14, 2015 at 3:02 PM, Alexander Korotkov 
wrote:

> On Sat, Sep 12, 2015 at 3:24 PM, Amit Kapila 
> wrote:
>
>> On Fri, Aug 14, 2015 at 7:23 PM, Alexander Korotkov 
>> wrote:
>> >
>> > On Thu, Aug 6, 2015 at 1:01 PM, Ildus Kurbangaliev <
>> i.kurbangal...@postgrespro.ru> wrote:
>> >>
>> >>
>> >> I've looked deeper and I found PgBackendStatus to be not a suitable
>> >> place for keeping information about low level waits. Really,
>> PgBackendStatus
>> >> is used to track high level information about backend. This is why
>> auxiliary
>> >> processes don't have PgBackendStatus, because they don't have such
>> information
>> >> to expose. But when we come to the low level wait events then auxiliary
>> >> processes are as useful for monitoring as backends are. WAL writer,
>> >> checkpointer, bgwriter etc are using LWLocks as well. This is
>> certainly unclear
>> >> why they can't be monitored.
>> >>
>> >> This is why I think we shoudn't place wait event into PgBackendStatus.
>> It
>> >> could be placed into PGPROC or even separate data structure with
>> different
>> >> concurrency model which would be most suitable for monitoring.
>> >
>> >
>> > +1 for tracking wait events not only for backends
>> >
>> > Ildus, could you do following?
>> > 1) Extract LWLocks refactoring into separate patch.
>> > 2) Make a patch with storing current wait event information in PGPROC.
>> >
>>
>> Now as Robert has committed standardization of lwlock names in
>> commit - aa65de04, let us try to summarize and work on remaining parts
>> of the patch. So I think now the next set of work is as follows:
>>
>> 1. Modify the tranche mechanism so that information about LWLocks
>> can be tracked easily.  For this already there is some discussion, ideas
>> and initial patch is floated in this thread and there doesn't seem to be
>> much
>> conflicts, so we can write the patch for it.  I am planning to write or
>> modify
>> the existing patch unless you, IIdus or anyone has objections or want to
>> write it, please let me know to avoid duplication of work.
>>
>> 2. Track wait_event in pg_stat_activity.  Now here the main point where
>> we doesn't seem to be in complete agreement is that shall we keep it
>> as one byte in PgBackendStatus or use two or more bytes to store
>> wait_event information and still there is another point made by you to
>> store it in PGProc rather than in PgBackendStatus so that we can display
>> information of background processes as well.
>>
>> Now as a matter of consensus, I think Tom has raised a fair point [1]
>> against
>> storing this information in PGProc and I feel that it is relatively less
>> important to have such information about background processes and the
>> reason for same is explained upthread [2].  About having more than
>> one-byte
>> to store information about various wait_events, I think now we will not
>> have
>> more than 100 or so events to track, do you really think that anytime in
>> forseeable
>> future we will have more than 256 important events which we would like to
>> track?
>>
>> So I think about this lets first try to build the consensus and then
>> attempt to
>> write or modify the patch.
>>
>>
>> [1] - http://www.postgresql.org/message-id/4067.1439561...@sss.pgh.pa.us
>> [2] -
>> http://www.postgresql.org/message-id/CAA4eK1JRQGTgRMRao6H65rA=fhifucnqhx7ongy58um6rn1...@mail.gmail.com
>>
>
> In order to build the consensus we need the roadmap for waits monitoring.
> Would single byte in PgBackendStatus be the only way for tracking wait
> events? Could we have pluggable infrastructure in waits monitoring: for
> instance, hooks for wait event begin and end?
> Limit of 256 wait events is probably OK. But we have no room for any
> additional parameters of wait event. For instance, if you notice high
> contention for buffer content lock then it would be nice to figure out: how
> many blocks are involved?, which blocks? We need to track additional
> parameters in order to answer this question.
>

We can track additional parameters by default or based on some
parameter, but do you think that tracking backends wait_event
information as proposed hinders in any which way the future extensions
in this area?  The point is that detailed discussion of other parameters
could be better done separately unless you think that this can block
future enhancements for waits monitoring.  I see wait monitoring of overall
database as a really valuable feature, but not able to see compelling
need to sort out everything before this patch.  Having said that, I am open
for discussion if you want specific things to be sorted out before moving
further.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-14 Thread Robert Haas
On Mon, Sep 14, 2015 at 5:32 AM, Alexander Korotkov
 wrote:
> In order to build the consensus we need the roadmap for waits monitoring.
> Would single byte in PgBackendStatus be the only way for tracking wait
> events? Could we have pluggable infrastructure in waits monitoring: for
> instance, hooks for wait event begin and end?

No, it's not the only way of doing it.  I proposed doing that way
because it's simple and cheap, but I'm not hell-bent on it.  My basic
concern here is about the cost of this.  I think that the most data we
can report without some kind of synchronization protocol is one 4-byte
integer.  If we want to report anything more than that, we're going to
need something like the st_changecount protocol, or a lock, and that's
going to add very significantly - and in my view unacceptably - to the
cost.  I care very much about having this facility be something that
we can use in lots of places, even extremely frequent operations like
buffer reads and contended lwlock acquisition.

I think that there may be some *kinds of waits* for which it's
practical to report additional detail.  For example, suppose that when
a heavyweight lock wait first happens, we just report the lock type
(relation, tuple, etc.) but then when the deadlock detector expires,
if we're still waiting, we report the entire lock tag.  Well, that's
going to happen infrequently enough, and is expensive enough anyway,
that the cost doesn't matter.  But if, every time we read a disk
block, we take a lock (or bump a changecount and do a write barrier),
dump the whole block tag in there, release the lock (or do another
write barrier and bump the changecount again) that sounds kind of
expensive to me.  Maybe we can prove that it doesn't matter on any
workload, but I doubt it.  We're fighting for every cycle in some of
these code paths, and there's good evidence that we're burning too
many of them compared to competing products already.

I am not a big fan of hooks as a way of resolving disagreements about
the design.  We may find that there are places where it's useful to
have hooks so that different extensions can do different things, and
that is fine.  But we shouldn't use that as a way of punting the
difficult questions.  There isn't enough common understanding here of
what we're all trying to get done and why we're trying to do it in
particular ways rather than in other ways to jump to the conclusion
that a hook is the right answer.  I'd prefer to have a nice, built-in
system that everyone agrees represents a good set of trade-offs than
an extensible system.

I think it's reasonable to consider reporting this data in the PGPROC
using a 4-byte integer rather than reporting it through a singe byte
in the backend status structure.  I believe that addresses the
concerns about reporting from auxiliary processes, and it also allows
a little more data to be reported.  For anything in excess of that, I
think we should think rather harder.  Most likely, such addition
detail should be reported only for certain types of wait events, or on
a delay, or something like that, so that the core mechanism remains
really, really fast.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-14 Thread Amit Kapila
On Sun, Sep 13, 2015 at 8:06 AM, Robert Haas  wrote:
>
> On Sat, Sep 12, 2015 at 8:24 AM, Amit Kapila 
wrote:
> > 1. Modify the tranche mechanism so that information about LWLocks
> > can be tracked easily.  For this already there is some discussion, ideas
> > and initial patch is floated in this thread and there doesn't seem to be
> > much
> > conflicts, so we can write the patch for it.  I am planning to write or
> > modify
> > the existing patch unless you, IIdus or anyone has objections or want to
> > write it, please let me know to avoid duplication of work.
>
> What I'd like to see happen here is two new API calls.  During the
> early initialization (before shared memory sizing, and during
> process_shared_preload_libraries), backends in either the core code or
> a loadable module can call RequestLWLockTranche(char *, int) to
> request a tranche with the given name and number of locks.
>

Won't this new API introduce another failure mode which is collision
of tranche name?  Will it give the error for collision during Request
call or during shared_memory create?

I understand that such an API could simplify the maintainance of tranche,
but won't it be slightly less user friendly, in particular due to additional
requirement of providing name of tranche which extensions user might or
might not understand the meaning of.


> Then, when
> shared memory is created, the core code creates a tranche which is
> part of MainLWLockArray.  The base of the tranche points to the first
> lock in that tranche, and the tranche is automatically registered for
> all subsequent backends.
>
I think same has to be done for tranches outside MainLWLockArray like WAL
or ReplicationOrigin?

>  In EXEC_BACKEND builds, this requires
> stashing the LWLockTranche and the name to which it points in shared
> memory somewhere, so that exec'd backends can look at shared memory
> and redo the registration.  In non-EXEC_BACKEND builds the values can
> just be inherited via fork.  Then, we add a second API call
> LookupLWTrancheByName(char *) which does just that.  This gets used to
> initialize backend-private pointers to the various tranches.
>

So will this also require changes in the way extensions assigns the
locks?
Now they call LWLockAssign() during shared memory initialization, so
will that needs change?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-14 Thread Alexander Korotkov
On Mon, Sep 14, 2015 at 2:12 PM, Amit Kapila 
wrote:

> On Mon, Sep 14, 2015 at 2:25 PM, Alexander Korotkov 
> wrote:
>
>> On Sat, Sep 12, 2015 at 2:05 PM, Amit Kapila 
>> wrote:
>>
>>> On Thu, Aug 6, 2015 at 3:31 PM, Ildus Kurbangaliev <
>>> i.kurbangal...@postgrespro.ru> wrote:
>>> >
>>> > On 08/05/2015 09:33 PM, Robert Haas wrote:
>>> >>
>>> >>
>>> >> You're missing the point.  Those multi-byte fields have additional
>>> >> synchronization requirements, as I explained in some detail in my
>>> >> previous email. You can't just wave that away.
>>> >
>>> > I see that now. Thank you for the point.
>>> >
>>> > I've looked deeper and I found PgBackendStatus to be not a suitable
>>> > place for keeping information about low level waits. Really,
>>> PgBackendStatus
>>> > is used to track high level information about backend. This is why
>>> auxiliary
>>> > processes don't have PgBackendStatus, because they don't have such
>>> information
>>> > to expose. But when we come to the low level wait events then auxiliary
>>> > processes are as useful for monitoring as backends are. WAL writer,
>>> > checkpointer, bgwriter etc are using LWLocks as well. This is
>>> certainly unclear
>>> > why they can't be monitored.
>>> >
>>>
>>> I think the chances of background processes stuck in LWLock is quite less
>>> as compare to backends as they do the activities periodically.  As an
>>> example
>>> WALWriter will take WALWriteLock to write the WAL, but actually there
>>> will never
>>> be any much contention for WALWriter. In synchronous_commit = on, the
>>> backends themselves write the WAL so WALWriter won't do much in that
>>> case and for synchronous_commit = off, backends won't write the WAL so
>>> WALWriter won't face any contention unless some buffers have to be
>>> written
>>> by bgwriter or checkpoint for which WAL is not flushed which I don't
>>> think
>>> would lead to any contention.
>>>
>>
>> Hmm, synchronous_commit is per session variable: some transactions could
>> run with synchronous_commit = on, but some with synchronous_commit = off.
>> This is very popular feature of PostgreSQL: achieve better performance by
>> making non-critical transaction asynchronous while leaving critical
>> transactions synchronous. Thus, contention for WALWriteLock between
>> backends and WALWriter could be real.
>>
>>
> I think it is difficult to say that can lead to contention due to periodic
> nature of WALWriter, but I don't deny that there is chance for
> background processes to have contention.
>

We don't know if there could be contention in advance. This is why we need
monitoring.


> I am not denying from the fact that there could be some contention in rare
>>> scenarios for background processes, but I think tracking them is not as
>>> important as tracking the LWLocks for backends.
>>>
>>
>> I would be more careful in calling some of scenarios rare. As DBMS
>> developers we should do our best to evade contention for LWLocks: any
>> contention, not only between backends and background processes.  One may
>> assume that high LWLock contention is rare scenario in general. Once we're
>> here we doesn't think so, though.
>> You claims that there couldn't be contention for WALWriteLock between
>> backends and WALWriter. This is unclear for me: I think it could be.
>>
>
> I think there would be more things where background processes could wait
> than LWLocks and I think they are important to track, but could be done
> separately
> from tracking them for pg_stat_activity.  Example, we have a
> pg_stat_bgwriter
> view, can't we think of tracking bgwriter/checkpointer wait information in
> that
> view and similarly for other background processes we can track in other
> views
> if any related view exists or create a new one to track for all background
> processes.
>
>
>> Nobody opposes tracking wait events for backends and tracking them for
>> background processes. I think we need to track both in order to provide
>> full picture to DBA.
>>
>>
> Sure, that is good to do, but can't we do it separately in another patch.
> I think in this patch lets just work for wait_events for backends.
>

Yes, but I think we should have a design of tracking wait event for every
process before implementing this only for backends.


> Also as we are planning to track the wait_event information in
>>> pg_stat_activity
>>> along with other backends information, it will not make sense to include
>>> information about backend processes in this variable as pg_stat_activity
>>> just displays information of backend processes.
>>>
>>
>> I'm not objecting that we should track only backends information in
>> pg_stat_activity. I think we should have also some other way of tracking
>> wait events for background processes. We should think it out before
>> extending pg_stat_activity to evade design issues later.
>>
>>
> I think we  can discuss if you see any specific problems or you want
> 

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-14 Thread Alexander Korotkov
On Mon, Sep 14, 2015 at 2:25 PM, Amit Kapila 
wrote:

> On Mon, Sep 14, 2015 at 3:02 PM, Alexander Korotkov 
> wrote:
>
>> On Sat, Sep 12, 2015 at 3:24 PM, Amit Kapila 
>> wrote:
>>
>>> On Fri, Aug 14, 2015 at 7:23 PM, Alexander Korotkov <
>>> aekorot...@gmail.com> wrote:
>>> >
>>> > On Thu, Aug 6, 2015 at 1:01 PM, Ildus Kurbangaliev <
>>> i.kurbangal...@postgrespro.ru> wrote:
>>> >>
>>> >>
>>> >> I've looked deeper and I found PgBackendStatus to be not a suitable
>>> >> place for keeping information about low level waits. Really,
>>> PgBackendStatus
>>> >> is used to track high level information about backend. This is why
>>> auxiliary
>>> >> processes don't have PgBackendStatus, because they don't have such
>>> information
>>> >> to expose. But when we come to the low level wait events then
>>> auxiliary
>>> >> processes are as useful for monitoring as backends are. WAL writer,
>>> >> checkpointer, bgwriter etc are using LWLocks as well. This is
>>> certainly unclear
>>> >> why they can't be monitored.
>>> >>
>>> >> This is why I think we shoudn't place wait event into
>>> PgBackendStatus. It
>>> >> could be placed into PGPROC or even separate data structure with
>>> different
>>> >> concurrency model which would be most suitable for monitoring.
>>> >
>>> >
>>> > +1 for tracking wait events not only for backends
>>> >
>>> > Ildus, could you do following?
>>> > 1) Extract LWLocks refactoring into separate patch.
>>> > 2) Make a patch with storing current wait event information in PGPROC.
>>> >
>>>
>>> Now as Robert has committed standardization of lwlock names in
>>> commit - aa65de04, let us try to summarize and work on remaining parts
>>> of the patch. So I think now the next set of work is as follows:
>>>
>>> 1. Modify the tranche mechanism so that information about LWLocks
>>> can be tracked easily.  For this already there is some discussion, ideas
>>> and initial patch is floated in this thread and there doesn't seem to be
>>> much
>>> conflicts, so we can write the patch for it.  I am planning to write or
>>> modify
>>> the existing patch unless you, IIdus or anyone has objections or want to
>>> write it, please let me know to avoid duplication of work.
>>>
>>> 2. Track wait_event in pg_stat_activity.  Now here the main point where
>>> we doesn't seem to be in complete agreement is that shall we keep it
>>> as one byte in PgBackendStatus or use two or more bytes to store
>>> wait_event information and still there is another point made by you to
>>> store it in PGProc rather than in PgBackendStatus so that we can display
>>> information of background processes as well.
>>>
>>> Now as a matter of consensus, I think Tom has raised a fair point [1]
>>> against
>>> storing this information in PGProc and I feel that it is relatively less
>>> important to have such information about background processes and the
>>> reason for same is explained upthread [2].  About having more than
>>> one-byte
>>> to store information about various wait_events, I think now we will not
>>> have
>>> more than 100 or so events to track, do you really think that anytime in
>>> forseeable
>>> future we will have more than 256 important events which we would like
>>> to track?
>>>
>>> So I think about this lets first try to build the consensus and then
>>> attempt to
>>> write or modify the patch.
>>>
>>>
>>> [1] - http://www.postgresql.org/message-id/4067.1439561...@sss.pgh.pa.us
>>> [2] -
>>> http://www.postgresql.org/message-id/CAA4eK1JRQGTgRMRao6H65rA=fhifucnqhx7ongy58um6rn1...@mail.gmail.com
>>>
>>
>> In order to build the consensus we need the roadmap for waits monitoring.
>> Would single byte in PgBackendStatus be the only way for tracking wait
>> events? Could we have pluggable infrastructure in waits monitoring: for
>> instance, hooks for wait event begin and end?
>> Limit of 256 wait events is probably OK. But we have no room for any
>> additional parameters of wait event. For instance, if you notice high
>> contention for buffer content lock then it would be nice to figure out: how
>> many blocks are involved?, which blocks? We need to track additional
>> parameters in order to answer this question.
>>
>
> We can track additional parameters by default or based on some
> parameter, but do you think that tracking backends wait_event
> information as proposed hinders in any which way the future extensions
> in this area?  The point is that detailed discussion of other parameters
> could be better done separately unless you think that this can block
> future enhancements for waits monitoring.
>

Yes, I think this can block future enhancements for waits monitoring.
You're currently proposing to store current wait event in just single byte.
And this is single byte not because of space economy, it is so by
concurrency design.
Isn't it natural to ask you how are we going to store something more about
current wait event? Should we store additional 

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-14 Thread Amit Kapila
On Mon, Sep 14, 2015 at 2:25 PM, Alexander Korotkov 
wrote:

> On Sat, Sep 12, 2015 at 2:05 PM, Amit Kapila 
> wrote:
>
>> On Thu, Aug 6, 2015 at 3:31 PM, Ildus Kurbangaliev <
>> i.kurbangal...@postgrespro.ru> wrote:
>> >
>> > On 08/05/2015 09:33 PM, Robert Haas wrote:
>> >>
>> >>
>> >> You're missing the point.  Those multi-byte fields have additional
>> >> synchronization requirements, as I explained in some detail in my
>> >> previous email. You can't just wave that away.
>> >
>> > I see that now. Thank you for the point.
>> >
>> > I've looked deeper and I found PgBackendStatus to be not a suitable
>> > place for keeping information about low level waits. Really,
>> PgBackendStatus
>> > is used to track high level information about backend. This is why
>> auxiliary
>> > processes don't have PgBackendStatus, because they don't have such
>> information
>> > to expose. But when we come to the low level wait events then auxiliary
>> > processes are as useful for monitoring as backends are. WAL writer,
>> > checkpointer, bgwriter etc are using LWLocks as well. This is certainly
>> unclear
>> > why they can't be monitored.
>> >
>>
>> I think the chances of background processes stuck in LWLock is quite less
>> as compare to backends as they do the activities periodically.  As an
>> example
>> WALWriter will take WALWriteLock to write the WAL, but actually there
>> will never
>> be any much contention for WALWriter. In synchronous_commit = on, the
>> backends themselves write the WAL so WALWriter won't do much in that
>> case and for synchronous_commit = off, backends won't write the WAL so
>> WALWriter won't face any contention unless some buffers have to be written
>> by bgwriter or checkpoint for which WAL is not flushed which I don't think
>> would lead to any contention.
>>
>
> Hmm, synchronous_commit is per session variable: some transactions could
> run with synchronous_commit = on, but some with synchronous_commit = off.
> This is very popular feature of PostgreSQL: achieve better performance by
> making non-critical transaction asynchronous while leaving critical
> transactions synchronous. Thus, contention for WALWriteLock between
> backends and WALWriter could be real.
>
>
I think it is difficult to say that can lead to contention due to periodic
nature of WALWriter, but I don't deny that there is chance for
background processes to have contention.


> I am not denying from the fact that there could be some contention in rare
>> scenarios for background processes, but I think tracking them is not as
>> important as tracking the LWLocks for backends.
>>
>
> I would be more careful in calling some of scenarios rare. As DBMS
> developers we should do our best to evade contention for LWLocks: any
> contention, not only between backends and background processes.  One may
> assume that high LWLock contention is rare scenario in general. Once we're
> here we doesn't think so, though.
> You claims that there couldn't be contention for WALWriteLock between
> backends and WALWriter. This is unclear for me: I think it could be.
>

I think there would be more things where background processes could wait
than LWLocks and I think they are important to track, but could be done
separately
from tracking them for pg_stat_activity.  Example, we have a
pg_stat_bgwriter
view, can't we think of tracking bgwriter/checkpointer wait information in
that
view and similarly for other background processes we can track in other
views
if any related view exists or create a new one to track for all background
processes.


> Nobody opposes tracking wait events for backends and tracking them for
> background processes. I think we need to track both in order to provide
> full picture to DBA.
>
>
Sure, that is good to do, but can't we do it separately in another patch.
I think in this patch lets just work for wait_events for backends.


> Also as we are planning to track the wait_event information in
>> pg_stat_activity
>> along with other backends information, it will not make sense to include
>> information about backend processes in this variable as pg_stat_activity
>> just displays information of backend processes.
>>
>
> I'm not objecting that we should track only backends information in
> pg_stat_activity. I think we should have also some other way of tracking
> wait events for background processes. We should think it out before
> extending pg_stat_activity to evade design issues later.
>
>
I think we  can discuss if you see any specific problems or you want
specific
things to be clarified, but sorting out the complete design of waits
monitoring
before this patch can extend the scope of this patch beyond need.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-14 Thread Vladimir Borodin

> 12 сент. 2015 г., в 14:05, Amit Kapila  написал(а):
> 
> On Thu, Aug 6, 2015 at 3:31 PM, Ildus Kurbangaliev 
> > wrote:
> >
> > On 08/05/2015 09:33 PM, Robert Haas wrote:
> >>
> >>
> >> You're missing the point.  Those multi-byte fields have additional
> >> synchronization requirements, as I explained in some detail in my
> >> previous email. You can't just wave that away.
> >
> > I see that now. Thank you for the point.
> >
> > I've looked deeper and I found PgBackendStatus to be not a suitable
> > place for keeping information about low level waits. Really, PgBackendStatus
> > is used to track high level information about backend. This is why auxiliary
> > processes don't have PgBackendStatus, because they don't have such 
> > information
> > to expose. But when we come to the low level wait events then auxiliary
> > processes are as useful for monitoring as backends are. WAL writer,
> > checkpointer, bgwriter etc are using LWLocks as well. This is certainly 
> > unclear
> > why they can't be monitored.
> >
> 
> I think the chances of background processes stuck in LWLock is quite less
> as compare to backends as they do the activities periodically.  As an example
> WALWriter will take WALWriteLock to write the WAL, but actually there will 
> never
> be any much contention for WALWriter. In synchronous_commit = on, the
> backends themselves write the WAL so WALWriter won't do much in that
> case and for synchronous_commit = off, backends won't write the WAL so
> WALWriter won't face any contention unless some buffers have to be written
> by bgwriter or checkpoint for which WAL is not flushed which I don't think
> would lead to any contention. 

WALWriter is not a good example, IMHO. And monitoring LWLocks is not the only 
thing that waits monitoring brings to us. Here [0] is an example when 
understanding of what is happening inside the startup process took some long 
time and led to GDB usage. With waits monitoring I could do a couple of SELECTs 
and use oid2name to understand the reason of a problem.

Also we should consider that PostgreSQL has a good infrastructure to 
parallelize many auxilary processes. Can we be sure that we will always have 
exactly one wal writer process? Perhaps, some time in the future we would need 
several of them and there would be contention for WALWriteLock between them. 
Perhaps, wal writer is not a good example here too, but having multiple 
checkpointer or bgwriter processes on the near future seems very likely, no?

[0] 
http://www.postgresql.org/message-id/fe82a9a7-0d52-41b5-a9ed-967f6927c...@simply.name
 

> 
> I am not denying from the fact that there could be some contention in rare
> scenarios for background processes, but I think tracking them is not as
> important as tracking the LWLocks for backends.
> 
> Also as we are planning to track the wait_event information in 
> pg_stat_activity
> along with other backends information, it will not make sense to include
> information about backend processes in this variable as pg_stat_activity
> just displays information of backend processes.
> 
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com 

--
May the force be with you…
https://simply.name



Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-14 Thread Alexander Korotkov
On Sat, Sep 12, 2015 at 2:05 PM, Amit Kapila 
wrote:

> On Thu, Aug 6, 2015 at 3:31 PM, Ildus Kurbangaliev <
> i.kurbangal...@postgrespro.ru> wrote:
> >
> > On 08/05/2015 09:33 PM, Robert Haas wrote:
> >>
> >>
> >> You're missing the point.  Those multi-byte fields have additional
> >> synchronization requirements, as I explained in some detail in my
> >> previous email. You can't just wave that away.
> >
> > I see that now. Thank you for the point.
> >
> > I've looked deeper and I found PgBackendStatus to be not a suitable
> > place for keeping information about low level waits. Really,
> PgBackendStatus
> > is used to track high level information about backend. This is why
> auxiliary
> > processes don't have PgBackendStatus, because they don't have such
> information
> > to expose. But when we come to the low level wait events then auxiliary
> > processes are as useful for monitoring as backends are. WAL writer,
> > checkpointer, bgwriter etc are using LWLocks as well. This is certainly
> unclear
> > why they can't be monitored.
> >
>
> I think the chances of background processes stuck in LWLock is quite less
> as compare to backends as they do the activities periodically.  As an
> example
> WALWriter will take WALWriteLock to write the WAL, but actually there will
> never
> be any much contention for WALWriter. In synchronous_commit = on, the
> backends themselves write the WAL so WALWriter won't do much in that
> case and for synchronous_commit = off, backends won't write the WAL so
> WALWriter won't face any contention unless some buffers have to be written
> by bgwriter or checkpoint for which WAL is not flushed which I don't think
> would lead to any contention.
>

Hmm, synchronous_commit is per session variable: some transactions could
run with synchronous_commit = on, but some with synchronous_commit = off.
This is very popular feature of PostgreSQL: achieve better performance by
making non-critical transaction asynchronous while leaving critical
transactions synchronous. Thus, contention for WALWriteLock between
backends and WALWriter could be real.

I am not denying from the fact that there could be some contention in rare
> scenarios for background processes, but I think tracking them is not as
> important as tracking the LWLocks for backends.
>

I would be more careful in calling some of scenarios rare. As DBMS
developers we should do our best to evade contention for LWLocks: any
contention, not only between backends and background processes.  One may
assume that high LWLock contention is rare scenario in general. Once we're
here we doesn't think so, though.
You claims that there couldn't be contention for WALWriteLock between
backends and WALWriter. This is unclear for me: I think it could be. Nobody
opposes tracking wait events for backends and tracking them for background
processes. I think we need to track both in order to provide full picture
to DBA.

Also as we are planning to track the wait_event information in
> pg_stat_activity
> along with other backends information, it will not make sense to include
> information about backend processes in this variable as pg_stat_activity
> just displays information of backend processes.
>

I'm not objecting that we should track only backends information in
pg_stat_activity. I think we should have also some other way of tracking
wait events for background processes. We should think it out before
extending pg_stat_activity to evade design issues later.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-14 Thread Alexander Korotkov
On Sat, Sep 12, 2015 at 3:24 PM, Amit Kapila 
wrote:

> On Fri, Aug 14, 2015 at 7:23 PM, Alexander Korotkov 
> wrote:
> >
> > On Thu, Aug 6, 2015 at 1:01 PM, Ildus Kurbangaliev <
> i.kurbangal...@postgrespro.ru> wrote:
> >>
> >>
> >> I've looked deeper and I found PgBackendStatus to be not a suitable
> >> place for keeping information about low level waits. Really,
> PgBackendStatus
> >> is used to track high level information about backend. This is why
> auxiliary
> >> processes don't have PgBackendStatus, because they don't have such
> information
> >> to expose. But when we come to the low level wait events then auxiliary
> >> processes are as useful for monitoring as backends are. WAL writer,
> >> checkpointer, bgwriter etc are using LWLocks as well. This is certainly
> unclear
> >> why they can't be monitored.
> >>
> >> This is why I think we shoudn't place wait event into PgBackendStatus.
> It
> >> could be placed into PGPROC or even separate data structure with
> different
> >> concurrency model which would be most suitable for monitoring.
> >
> >
> > +1 for tracking wait events not only for backends
> >
> > Ildus, could you do following?
> > 1) Extract LWLocks refactoring into separate patch.
> > 2) Make a patch with storing current wait event information in PGPROC.
> >
>
> Now as Robert has committed standardization of lwlock names in
> commit - aa65de04, let us try to summarize and work on remaining parts
> of the patch. So I think now the next set of work is as follows:
>
> 1. Modify the tranche mechanism so that information about LWLocks
> can be tracked easily.  For this already there is some discussion, ideas
> and initial patch is floated in this thread and there doesn't seem to be
> much
> conflicts, so we can write the patch for it.  I am planning to write or
> modify
> the existing patch unless you, IIdus or anyone has objections or want to
> write it, please let me know to avoid duplication of work.
>
> 2. Track wait_event in pg_stat_activity.  Now here the main point where
> we doesn't seem to be in complete agreement is that shall we keep it
> as one byte in PgBackendStatus or use two or more bytes to store
> wait_event information and still there is another point made by you to
> store it in PGProc rather than in PgBackendStatus so that we can display
> information of background processes as well.
>
> Now as a matter of consensus, I think Tom has raised a fair point [1]
> against
> storing this information in PGProc and I feel that it is relatively less
> important to have such information about background processes and the
> reason for same is explained upthread [2].  About having more than one-byte
> to store information about various wait_events, I think now we will not
> have
> more than 100 or so events to track, do you really think that anytime in
> forseeable
> future we will have more than 256 important events which we would like to
> track?
>
> So I think about this lets first try to build the consensus and then
> attempt to
> write or modify the patch.
>
>
> [1] - http://www.postgresql.org/message-id/4067.1439561...@sss.pgh.pa.us
> [2] -
> http://www.postgresql.org/message-id/CAA4eK1JRQGTgRMRao6H65rA=fhifucnqhx7ongy58um6rn1...@mail.gmail.com
>

In order to build the consensus we need the roadmap for waits monitoring.
Would single byte in PgBackendStatus be the only way for tracking wait
events? Could we have pluggable infrastructure in waits monitoring: for
instance, hooks for wait event begin and end?
Limit of 256 wait events is probably OK. But we have no room for any
additional parameters of wait event. For instance, if you notice high
contention for buffer content lock then it would be nice to figure out: how
many blocks are involved?, which blocks? We need to track additional
parameters in order to answer this question.
Comments about tracking only backends are in [1] and [2].
PGProc is not the only way to track events with parameters for every
process. We could try to extend PgBackendStatus or other build secondary
data structure.

However, this is depending of our general roadmap. If pg_stat_activity is
just for default while suitable waits monitoring could be a plugin, then
it's pobably OK.

[1] -
http://www.postgresql.org/message-id/capphfdsnju40qud2sqjs_iu+z-jm-b_f39foc7eydjvza5e...@mail.gmail.com
[2] -
http://www.postgresql.org/message-id/9a99c2a7-1760-419f-bdc9-a2cf99ecd...@simply.name

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-13 Thread Ildus Kurbangaliev

> On Sep 13, 2015, at 5:36 AM, Robert Haas  wrote:
> 
> On Sat, Sep 12, 2015 at 8:24 AM, Amit Kapila  wrote:
>> 1. Modify the tranche mechanism so that information about LWLocks
>> can be tracked easily.  For this already there is some discussion, ideas
>> and initial patch is floated in this thread and there doesn't seem to be
>> much
>> conflicts, so we can write the patch for it.  I am planning to write or
>> modify
>> the existing patch unless you, IIdus or anyone has objections or want to
>> write it, please let me know to avoid duplication of work.
> 
> What I'd like to see happen here is two new API calls.  During the
> early initialization (before shared memory sizing, and during
> process_shared_preload_libraries), backends in either the core code or
> a loadable module can call RequestLWLockTranche(char *, int) to
> request a tranche with the given name and number of locks.  Then, when
> shared memory is created, the core code creates a tranche which is
> part of MainLWLockArray.  The base of the tranche points to the first
> lock in that tranche, and the tranche is automatically registered for
> all subsequent backends.  In EXEC_BACKEND builds, this requires
> stashing the LWLockTranche and the name to which it points in shared
> memory somewhere, so that exec'd backends can look at shared memory
> and redo the registration.  In non-EXEC_BACKEND builds the values can
> just be inherited via fork.  Then, we add a second API call
> LookupLWTrancheByName(char *) which does just that.  This gets used to
> initialize backend-private pointers to the various tranches.
> 
> Besides splitting apart the main tranche into a bunch of tranches with
> individual names so that we can identify lwlocks easily, this approach
> makes sure that the number of locks requested by an extension matches
> the number it actually gets.  Today, frustratingly, an extension that
> requests one number of locks and uses another may work or fail
> unpredictably depending on what other extensions are loaded and what
> they do.  This would eliinate that nuisance: the new APIs would
> obsolete RequestAddinLWLocks, which I would propose to remove.
> 
> Thoughts?

This is pretty much the same that my patch does. There is
two API calls (for a size determination and a tranche creation), except
MainLWLockArray is used only for individual LWLocks.

Also I suggest to keep RequestAddinLWLocks for backward compatibility.


Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com 
The Russian Postgres Company

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-13 Thread Robert Haas
On Sun, Sep 13, 2015 at 11:09 AM, Ildus Kurbangaliev
 wrote:
> This is pretty much the same that my patch does. There is
> two API calls (for a size determination and a tranche creation), except
> MainLWLockArray is used only for individual LWLocks.

It's not really the same.  Your patch doesn't provide any interlock to
ensure that the number of locks requested for a particular subsystem
during shmem sizing is the same as the number actually created during
shmem setup.  That's an interlock I'd really like to have.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-12 Thread Robert Haas
On Sat, Sep 12, 2015 at 8:24 AM, Amit Kapila  wrote:
> 1. Modify the tranche mechanism so that information about LWLocks
> can be tracked easily.  For this already there is some discussion, ideas
> and initial patch is floated in this thread and there doesn't seem to be
> much
> conflicts, so we can write the patch for it.  I am planning to write or
> modify
> the existing patch unless you, IIdus or anyone has objections or want to
> write it, please let me know to avoid duplication of work.

What I'd like to see happen here is two new API calls.  During the
early initialization (before shared memory sizing, and during
process_shared_preload_libraries), backends in either the core code or
a loadable module can call RequestLWLockTranche(char *, int) to
request a tranche with the given name and number of locks.  Then, when
shared memory is created, the core code creates a tranche which is
part of MainLWLockArray.  The base of the tranche points to the first
lock in that tranche, and the tranche is automatically registered for
all subsequent backends.  In EXEC_BACKEND builds, this requires
stashing the LWLockTranche and the name to which it points in shared
memory somewhere, so that exec'd backends can look at shared memory
and redo the registration.  In non-EXEC_BACKEND builds the values can
just be inherited via fork.  Then, we add a second API call
LookupLWTrancheByName(char *) which does just that.  This gets used to
initialize backend-private pointers to the various tranches.

Besides splitting apart the main tranche into a bunch of tranches with
individual names so that we can identify lwlocks easily, this approach
makes sure that the number of locks requested by an extension matches
the number it actually gets.  Today, frustratingly, an extension that
requests one number of locks and uses another may work or fail
unpredictably depending on what other extensions are loaded and what
they do.  This would eliinate that nuisance: the new APIs would
obsolete RequestAddinLWLocks, which I would propose to remove.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-12 Thread Amit Kapila
On Thu, Aug 6, 2015 at 3:31 PM, Ildus Kurbangaliev <
i.kurbangal...@postgrespro.ru> wrote:
>
> On 08/05/2015 09:33 PM, Robert Haas wrote:
>>
>>
>> You're missing the point.  Those multi-byte fields have additional
>> synchronization requirements, as I explained in some detail in my
>> previous email. You can't just wave that away.
>
> I see that now. Thank you for the point.
>
> I've looked deeper and I found PgBackendStatus to be not a suitable
> place for keeping information about low level waits. Really,
PgBackendStatus
> is used to track high level information about backend. This is why
auxiliary
> processes don't have PgBackendStatus, because they don't have such
information
> to expose. But when we come to the low level wait events then auxiliary
> processes are as useful for monitoring as backends are. WAL writer,
> checkpointer, bgwriter etc are using LWLocks as well. This is certainly
unclear
> why they can't be monitored.
>

I think the chances of background processes stuck in LWLock is quite less
as compare to backends as they do the activities periodically.  As an
example
WALWriter will take WALWriteLock to write the WAL, but actually there will
never
be any much contention for WALWriter. In synchronous_commit = on, the
backends themselves write the WAL so WALWriter won't do much in that
case and for synchronous_commit = off, backends won't write the WAL so
WALWriter won't face any contention unless some buffers have to be written
by bgwriter or checkpoint for which WAL is not flushed which I don't think
would lead to any contention.

I am not denying from the fact that there could be some contention in rare
scenarios for background processes, but I think tracking them is not as
important as tracking the LWLocks for backends.

Also as we are planning to track the wait_event information in
pg_stat_activity
along with other backends information, it will not make sense to include
information about backend processes in this variable as pg_stat_activity
just displays information of backend processes.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-12 Thread Amit Kapila
On Fri, Aug 14, 2015 at 7:23 PM, Alexander Korotkov 
wrote:
>
> On Thu, Aug 6, 2015 at 1:01 PM, Ildus Kurbangaliev <
i.kurbangal...@postgrespro.ru> wrote:
>>
>>
>> I've looked deeper and I found PgBackendStatus to be not a suitable
>> place for keeping information about low level waits. Really,
PgBackendStatus
>> is used to track high level information about backend. This is why
auxiliary
>> processes don't have PgBackendStatus, because they don't have such
information
>> to expose. But when we come to the low level wait events then auxiliary
>> processes are as useful for monitoring as backends are. WAL writer,
>> checkpointer, bgwriter etc are using LWLocks as well. This is certainly
unclear
>> why they can't be monitored.
>>
>> This is why I think we shoudn't place wait event into PgBackendStatus. It
>> could be placed into PGPROC or even separate data structure with
different
>> concurrency model which would be most suitable for monitoring.
>
>
> +1 for tracking wait events not only for backends
>
> Ildus, could you do following?
> 1) Extract LWLocks refactoring into separate patch.
> 2) Make a patch with storing current wait event information in PGPROC.
>

Now as Robert has committed standardization of lwlock names in
commit - aa65de04, let us try to summarize and work on remaining parts
of the patch. So I think now the next set of work is as follows:

1. Modify the tranche mechanism so that information about LWLocks
can be tracked easily.  For this already there is some discussion, ideas
and initial patch is floated in this thread and there doesn't seem to be
much
conflicts, so we can write the patch for it.  I am planning to write or
modify
the existing patch unless you, IIdus or anyone has objections or want to
write it, please let me know to avoid duplication of work.

2. Track wait_event in pg_stat_activity.  Now here the main point where
we doesn't seem to be in complete agreement is that shall we keep it
as one byte in PgBackendStatus or use two or more bytes to store
wait_event information and still there is another point made by you to
store it in PGProc rather than in PgBackendStatus so that we can display
information of background processes as well.

Now as a matter of consensus, I think Tom has raised a fair point [1]
against
storing this information in PGProc and I feel that it is relatively less
important to have such information about background processes and the
reason for same is explained upthread [2].  About having more than one-byte
to store information about various wait_events, I think now we will not have
more than 100 or so events to track, do you really think that anytime in
forseeable
future we will have more than 256 important events which we would like to
track?

So I think about this lets first try to build the consensus and then
attempt to
write or modify the patch.


[1] - http://www.postgresql.org/message-id/4067.1439561...@sss.pgh.pa.us
[2] -
http://www.postgresql.org/message-id/CAA4eK1JRQGTgRMRao6H65rA=fhifucnqhx7ongy58um6rn1...@mail.gmail.com


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-02 Thread Robert Haas
On Tue, Sep 1, 2015 at 6:43 PM, and...@anarazel.de  wrote:
> Why a new tranche for each of these? And it can't be correct that each
> has the same base?

I complained about the same-base problem before.  Apparently, that got ignored.

> I don't really like the tranche model as in the patch right now. I'd
> rather have in a way that we have one tranch for all the individual
> lwlocks, where the tranche points to an array of names alongside the
> tranche's name. And then for the others we just supply the tranche name,
> but leave the name array empty, whereas a name can be generated.

That's an interesting idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-09-01 Thread and...@anarazel.de
On 2015-08-04 23:37:08 +0300, Ildus Kurbangaliev wrote:
> diff --git a/src/backend/access/transam/clog.c 
> b/src/backend/access/transam/clog.c
> index 3a58f1e..10c25cf 100644
> --- a/src/backend/access/transam/clog.c
> +++ b/src/backend/access/transam/clog.c
> @@ -457,7 +457,8 @@ CLOGShmemInit(void)
>  {
>   ClogCtl->PagePrecedes = CLOGPagePrecedes;
>   SimpleLruInit(ClogCtl, "CLOG Ctl", CLOGShmemBuffers(), 
> CLOG_LSNS_PER_PAGE,
> -   CLogControlLock, "pg_clog");
> +   CLogControlLock, "pg_clog",
> +   "CLogBufferLocks");
>  }

I'd rather just add the name "clog" (etc) as a string once to
SimpleLruInit instead of now four 3 times.

> +/* Lock names. For monitoring purposes */
> +const char *LOCK_NAMES[] =
> +{
> + "Relation",
> + "RelationExtend",
> + "Page",
> + "Tuple",
> + "Transaction",
> + "VirtualTransaction",
> + "SpeculativeToken",
> + "Object",
> + "Userlock",
> + "Advisory"
> +};

Why do we need this rather than DescribeLockTag?

> + /* Create tranches for individual LWLocks */
> + for (i = 0; i < NUM_INDIVIDUAL_LWLOCKS; i++, tranche++)
> + {
> + int id = LWLockNewTrancheId();
> +
> + /*
> +  * We need to be sure that generated id is equal to 
> index
> +  * for individual LWLocks
> +  */
> + Assert(id == i);
> +
> + tranche->array_base = MainLWLockArray;
> + tranche->array_stride = sizeof(LWLockPadded);
> + MemSet(tranche->name, 0, LWLOCK_MAX_TRANCHE_NAME);
> +
> + /* Initialize individual LWLock */
> + LWLockInitialize([i].lock, id);
> +
> + /* Register new tranche in tranches array */
> + LWLockRegisterTranche(id, tranche);
> + }
> +
> + /* Fill individual LWLock names */
> + InitLWLockNames();

Why a new tranche for each of these? And it can't be correct that each
has the same base?



I don't really like the tranche model as in the patch right now. I'd
rather have in a way that we have one tranch for all the individual
lwlocks, where the tranche points to an array of names alongside the
tranche's name. And then for the others we just supply the tranche name,
but leave the name array empty, whereas a name can be generated.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-08-26 Thread Bruce Momjian
On Mon, Jul 27, 2015 at 04:10:14PM -0500, Jim Nasby wrote:
 Sure, but I don't think this makes it impossible to figure out who's
 locking who.  I think the only thing you need other than the data in
 pg_locks is the conflicts table, which is well documented.
 
 Oh, hmm, one thing missing is the ordering of the wait queue for each
 locked object.  If process A holds RowExclusive on some object, process
 B wants ShareLock (stalled waiting) and process C wants AccessExclusive
 (also stalled waiting), who of B and C is woken up first after A
 releases the lock depends on order of arrival.
 
 Agreed - it would be nice to expose that somehow.
 
 +1. It's very common to want to know who's blocking who, and not at
 all easy to do that today. We should at minimum have a canonical
 example of how to do it, but something built in would be even
 better.

Coming in late here, but have you looked at my locking presentation;  I
think there are examples in there:

http://momjian.us/main/writings/pgsql/locking.pdf

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-08-14 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes:
 On Thu, Aug 6, 2015 at 1:01 PM, Ildus Kurbangaliev 
 i.kurbangal...@postgrespro.ru wrote:
 This is why I think we shoudn't place wait event into PgBackendStatus. It
 could be placed into PGPROC or even separate data structure with different
 concurrency model which would be most suitable for monitoring.

 +1 for tracking wait events not only for backends

 Ildus, could you do following?
 1) Extract LWLocks refactoring into separate patch.
 2) Make a patch with storing current wait event information in PGPROC.

What will this accomplish exactly, other than making it more complicated
to make a copy of the information when we capture an activity snapshot?
You'll have to get data out of two places, which do not have any
synchronization protocol defined between them.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-08-14 Thread Alexander Korotkov
On Thu, Aug 6, 2015 at 1:01 PM, Ildus Kurbangaliev 
i.kurbangal...@postgrespro.ru wrote:

 On 08/05/2015 09:33 PM, Robert Haas wrote:

 On Wed, Aug 5, 2015 at 1:10 PM, Ildus Kurbangaliev
 i.kurbangal...@postgrespro.ru wrote:

 About `memcpy`, PgBackendStatus struct already have a bunch of multi-byte
 variables,  so it will be
 not consistent anyway if somebody will want to copy it in that way. On
 the
 other hand two bytes in this case
 give less overhead because we can avoid the offset calculations. And as
 I've
 mentioned before the class
 of wait will be useful when monitoring of waits will be extended.

 You're missing the point.  Those multi-byte fields have additional
 synchronization requirements, as I explained in some detail in my
 previous email. You can't just wave that away.

 I see that now. Thank you for the point.

 I've looked deeper and I found PgBackendStatus to be not a suitable
 place for keeping information about low level waits. Really,
 PgBackendStatus
 is used to track high level information about backend. This is why
 auxiliary
 processes don't have PgBackendStatus, because they don't have such
 information
 to expose. But when we come to the low level wait events then auxiliary
 processes are as useful for monitoring as backends are. WAL writer,
 checkpointer, bgwriter etc are using LWLocks as well. This is certainly
 unclear
 why they can't be monitored.

 This is why I think we shoudn't place wait event into PgBackendStatus. It
 could be placed into PGPROC or even separate data structure with different
 concurrency model which would be most suitable for monitoring.


+1 for tracking wait events not only for backends

Ildus, could you do following?
1) Extract LWLocks refactoring into separate patch.
2) Make a patch with storing current wait event information in PGPROC.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-08-06 Thread Ildus Kurbangaliev

On 08/05/2015 09:33 PM, Robert Haas wrote:

On Wed, Aug 5, 2015 at 1:10 PM, Ildus Kurbangaliev
i.kurbangal...@postgrespro.ru wrote:

About `memcpy`, PgBackendStatus struct already have a bunch of multi-byte
variables,  so it will be
not consistent anyway if somebody will want to copy it in that way. On the
other hand two bytes in this case
give less overhead because we can avoid the offset calculations. And as I've
mentioned before the class
of wait will be useful when monitoring of waits will be extended.

You're missing the point.  Those multi-byte fields have additional
synchronization requirements, as I explained in some detail in my
previous email. You can't just wave that away.

I see that now. Thank you for the point.

I've looked deeper and I found PgBackendStatus to be not a suitable
place for keeping information about low level waits. Really, PgBackendStatus
is used to track high level information about backend. This is why auxiliary
processes don't have PgBackendStatus, because they don't have such 
information

to expose. But when we come to the low level wait events then auxiliary
processes are as useful for monitoring as backends are. WAL writer,
checkpointer, bgwriter etc are using LWLocks as well. This is certainly 
unclear

why they can't be monitored.

This is why I think we shoudn't place wait event into PgBackendStatus. It
could be placed into PGPROC or even separate data structure with different
concurrency model which would be most suitable for monitoring.

--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-08-05 Thread Amit Kapila
On Tue, Aug 4, 2015 at 5:45 PM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Jul 28, 2015 at 3:28 PM, Heikki Linnakangas hlinn...@iki.fi
wrote:
  * The patch requires that the LWLOCK_INDIVIDUAL_NAMES array is kept in
sync
  with the list of individual locks in lwlock.h. Sooner or later someone
will
  add an LWLock and forget to update the names-array. That needs to be
made
  less error-prone, so that the names are maintained in the same place as
the
  #defines. Perhaps something like rmgrlist.h.

 This is a good idea, but it's not easy to do in the style of
 rmgrlist.h, because I don't believe there's any way to define a macro
 that expands to a preprocessor directive.  Attached is a patch that
 instead generates the list of macros from a text file, and also
 generates an array inside lwlock.c with the lock names that gets used
 by the Trace_lwlocks stuff where applicable.

 Any objections to this solution to the problem?  If not, I'd like to
 go ahead and push this much.  I can't test the Windows changes
 locally, though, so it would be helpful if someone could check that
 out.


Okay, I have check it on Windows and found below issues which I
have fixed in patch attached with this mail.

1. Got below error while running mkvcbuild.pl

Use of uninitialized value in numeric lt () at Solution.pm line 103.
Could not open src/backend/storage/lmgr/lwlocknames.h at Mkvcbuild.pm line
674.

+ if (IsNewer(
+ 'src/backend/storage/lmgr/lwlocknames.txt',
'src/include/storage/lwlocknames.h'))

I think the usage of old and new filenames in IsNewer is reversed here.

2.
+ print Generating lwlocknames.c and fmgroids.h...\n;

Here it should be lwlocknames.h instead of fmgroids.h

3.
+ if (IsNewer(
+ 'src/include/storage/lwlocknames.h',
+
'src/backend/storage/lmgr/fmgroids.h'))

Here, it seems lwlocknames.h needs to compared instead of fmgroids.h

4. Got below error while running mkvcbuild.pl
Generating lwlocknames.c and fmgroids.h...
/* autogenerated from src/backend/storage/lmgr/lwlocknames.txt, do not edit
*/
/* there is deliberately not an #ifndef LWLOCKNAMES_H here */
rename: lwlocknames.h.tmp3180: Permission denied at generate-lwlocknames.pl
line
 59, $lwlocknames line 47.

In generate-lwlocknames.pl, below line is causing problem.

rename($htmp, 'lwlocknames.h') || die rename: $htmp: $!;

The reason is that closing of tmp files is missing.


Some other general observations

5.
On running perl mkvcbuild.pl in Windows, below message is generated.

Generating lwlocknames.c and lwlocknames.h...
/* autogenerated from src/backend/storage/lmgr/lwlocknames.txt, do not edit
*/
/* there is deliberately not an #ifndef LWLOCKNAMES_H here */

Following message gets displayed, which looks slightly odd, displaying
it in C comments makes it look out of place and other similar .pl files
don't generate such message.  Having said that, I think it displays useful
information, so it might be okay to retain it.

6.
+
+maintainer-clean: clean
+ rm -f lwlocknames.h

Here, don't we need to clean lwlocknames.c?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


lwlocknames-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-08-05 Thread Robert Haas
On Tue, Aug 4, 2015 at 8:46 PM, Josh Berkus j...@agliodbs.com wrote:
 On 06/25/2015 07:50 AM, Tom Lane wrote:
 To do that, we'd have to change the semantics of the 'waiting' column so
 that it becomes true for non-heavyweight-lock waits.  I'm not sure whether
 that's a good idea or not; I'm afraid there may be client-side code that
 expects 'waiting' to indicate that there's a corresponding row in
 pg_locks.  If we're willing to do that, then I'd be okay with
 allowing wait_status to be defined as last thing waited for; but the
 two points aren't separable.

 Speaking as someone who writes a lot of monitoring and alerting code,
 changing the meaning of the waiting column is OK as long as there's
 still a boolean column named waiting and it means query blocked in
 some way.

 Users are used to pg_stat_activity.waiting failing to join against
 pg_locks ... for one thing, there's timing issues there.  So pretty much
 everything I've seen uses outer joins anyway.

All of that is exactly how I feel about it, too.  It's not totally
painless to redefine waiting, but we're not proposing a *big* change
in semantics.  The way I see it, if we change this now, some people
will need to adjust, but it won't really be a big deal.  If we insist
that waiting is graven in stone, then in 5 years people will still
be wondering why the waiting column is inconsistent with the
wait_state column.  That's not going to be a win.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-08-05 Thread Ildus Kurbangaliev

On 08/04/2015 11:47 PM, Robert Haas wrote:

On Tue, Aug 4, 2015 at 4:37 PM, Ildus Kurbangaliev
i.kurbangal...@postgrespro.ru wrote:

A new version of the patch. I used your idea with macros, and with tranches that
allowed us to remove array with names (they can be written directly to the 
corresponding
tranche).

You seem not to have addressed a few of the points I brought up here:

http://www.postgresql.org/message-id/CA+TgmoaGqhah0VTamsfaOMaE9uOrCPYSXN8hCS9=wirupjs...@mail.gmail.com



About `memcpy`, PgBackendStatus struct already have a bunch of 
multi-byte variables,  so it will be
not consistent anyway if somebody will want to copy it in that way. On 
the other hand two bytes in this case
give less overhead because we can avoid the offset calculations. And as 
I've mentioned before the class

of wait will be useful when monitoring of waits will be extended.

Other things from that patch already changed in latest patch.

On 08/04/2015 11:53 PM, Alvaro Herrera wrote:

Just a bystander here, I haven't reviewed this patch at all, but I have
two questions,

1. have you tested this under -DEXEC_BACKEND ?  I wonder if those
initializations are going to work on Windows.

No, it wasn't tested on Windows

--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-08-05 Thread Robert Haas
On Wed, Aug 5, 2015 at 1:10 PM, Ildus Kurbangaliev
i.kurbangal...@postgrespro.ru wrote:
 About `memcpy`, PgBackendStatus struct already have a bunch of multi-byte
 variables,  so it will be
 not consistent anyway if somebody will want to copy it in that way. On the
 other hand two bytes in this case
 give less overhead because we can avoid the offset calculations. And as I've
 mentioned before the class
 of wait will be useful when monitoring of waits will be extended.

You're missing the point.  Those multi-byte fields have additional
synchronization requirements, as I explained in some detail in my
previous email. You can't just wave that away.

When people raise points in review, you need to respond to those with
discussion, not just blast out a new patch version that may or may not
have made some changes in that area.  Otherwise, you're wasting the
time of the people who are reviewing, which is not nice.

 1. have you tested this under -DEXEC_BACKEND ?  I wonder if those
 initializations are going to work on Windows.

 No, it wasn't tested on Windows

I don't think it's going to work on Windows.
CreateSharedMemoryAndSemaphores() is called once only from the
postmaster on non-EXEC_BACKEND builds, but on EXEC_BACKEND builds
(i.e. Windows) it's called for every process startup.  Thus, it's got
to be idempotent: if the shared memory structures it's looking for
already exist, it must not try to recreate them.  You have, for
example, InitBufferPool() calling LWLockCreateTranche(), which
unconditionally assigns a new tranche ID.  It can't do that; all of
the processes in the system have to agree on what the tranche IDs are.

The general problem that I see with splitting apart the main lwlock
array into many tranches is that all of those tranches need to get set
up properly - with matching tranche IDs - in every backend.  In
non-EXEC_BACKEND builds, that's basically free, but on EXEC_BACKEND
builds it isn't.  I think that's OK; this won't be the first piece of
state where EXEC_BACKEND builds incur some extra overhead.  But we
should make an effort to keep that overhead small.

The way to do that, I think, is to store some state in shared memory
that, in EXEC_BACKEND builds, will allow new postmaster children to
correctly re-register all of the tranches.  It seems to me that we can
do this as follows:

1. Have a compiled-in array of built-in tranche names.
2. In LWLockShmemSize(), work out the number of lwlocks each tranche
should contain.
3. In CreateLWLocks(), if IsUnderPostmaster, grab enough shared memory
for all the lwlocks themselves plus enough extra shared memory for one
pointer per tranche.  Store pointers to the start of each tranche in
shared memory, and initialize all the lwlocks.
4. In CreateLWLocks(), if tranche registration has not yet been done
(either because we are the postmaster, or because this is the
EXEC_BACKEND case) loop over the array of built-in tranche names and
register each one with the corresponding address grabbed from shared
memory.

A more radical redesign would be to have each tranche of LWLocks as a
separate chunk of shared memory, registered with ShmemInitStruct(),
and let EXEC_BACKEND children find those chunks by name using
ShmemIndex.  But that seems like more work to me for not much benefit,
especially since there's this weird thing where lwlocks are
initialized before ShmemIndex gets set up.

Yet another possible approach is to have each module register its own
tranche and track its own tranche ID using the same kind of strategy
that replication origins and WAL insert locks already employ.  That
may well be the right long-term strategy, but it seems like sort of a
pain to all that effort right now for this project, so I'm inclined to
hack on the approach described above and see if I can get that working
for now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-08-04 Thread Ildus Kurbangaliev

On 08/03/2015 04:25 PM, Ildus Kurbangaliev wrote:

On 07/28/2015 10:28 PM, Heikki Linnakangas wrote:

On 07/27/2015 01:20 PM, Ildus Kurbangaliev wrote:

Hello.
In the attached patch I've made a refactoring for tranches.
The prefix for them was extended,  and I've did a split of LWLockAssign
to two
functions (one with tranche and second for user defined LWLocks).


This needs some work in order to be maintainable:

* The patch requires that the LWLOCK_INDIVIDUAL_NAMES array is kept 
in sync with the list of individual locks in lwlock.h. Sooner or 
later someone will add an LWLock and forget to update the 
names-array. That needs to be made less error-prone, so that the 
names are maintained in the same place as the #defines. Perhaps 
something like rmgrlist.h.


* The base tranches are a bit funny. They all have the same 
array_base, pointing to MainLWLockArray. If there are e.g. 5 clog 
buffer locks, I would expect the T_NAME() to return ClogBufferLocks 
for all of them, and T_ID() to return numbers between 0-4. But in 
reality, T_ID() will return something like 55-59.


Instead of passing a tranche-id to LWLockAssign(), I think it would 
be more clear to have a new function to allocate a contiguous block 
of lwlocks as a new tranche. It could then set the base correctly.


* Instead of having LWLOCK_INDIVIDUAL_NAMES to name individual 
locks, how about just giving each one of them a separate tranche?


* User manual needs to be updated to explain the new column in 
pg_stat_activity.


- Heikki



Hello. Thanks for review. I attached new version of patch.

It adds new field in pg_stat_activity that shows current wait in backend.

I've did a some refactoring LWLocks tranche mechanism. In lwlock.c only
invididual and user defined LWLocks is creating, other LWLocks created by
modules who need them. I think that is more logical (user know about 
module,

not module about of all users). It also simplifies LWLocks acquirement.

Now each individual LWLock and other groups of LWLocks have their 
tranche, and can
be easily identified. If somebody will add new individual LWLock and 
forget to add
its name, postgres will show a message. Individual LWLocks still 
allocated in

one array but tranches for them point to particular index in main array.

Sample:

b1=# select pid, wait_event from pg_stat_activity; \watch 1

 pid  |   wait_event
--+-
 7722 | Storage: READ
 7653 |
 7723 | Network: WRITE
 7725 | Network: READ
 7727 | Locks: Transaction
 7731 | Storage: READ
 7734 | Network: READ
 7735 | Storage: READ
 7739 | LWLocks: WALInsertLocks
 7738 | Locks: Transaction
 7740 | LWLocks: BufferMgrLocks
 7741 | Network: READ
 7742 | Network: READ
 7743 | Locks: Transaction


Attatched new version of patch with some small fixes in code

--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e64b7ef..2e4e67e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -630,6 +630,11 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
  entryTrue if this backend is currently waiting on a lock/entry
 /row
 row
+ entrystructfieldwait_event//entry
+ entrytypetext//entry
+ entryName of a current wait event in backend/entry
+/row
+row
  entrystructfieldstate//entry
  entrytypetext//entry
  entryCurrent overall state of this backend.
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 3a58f1e..10c25cf 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -457,7 +457,8 @@ CLOGShmemInit(void)
 {
 	ClogCtl-PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(ClogCtl, CLOG Ctl, CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-  CLogControlLock, pg_clog);
+  CLogControlLock, pg_clog,
+  CLogBufferLocks);
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 5ad35c0..dd7578f 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -466,7 +466,8 @@ CommitTsShmemInit(void)
 
 	CommitTsCtl-PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, CommitTs Ctl, CommitTsShmemBuffers(), 0,
-  CommitTsControlLock, pg_commit_ts);
+  CommitTsControlLock, pg_commit_ts,
+  CommitTSBufferLocks);
 
 	commitTsShared = ShmemInitStruct(CommitTs shared,
 	 sizeof(CommitTimestampShared),
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 1933a87..b905c59 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1842,10 +1842,12 @@ MultiXactShmemInit(void)
 
 	SimpleLruInit(MultiXactOffsetCtl,
   MultiXactOffset Ctl, NUM_MXACTOFFSET_BUFFERS, 0,
-  MultiXactOffsetControlLock, pg_multixact/offsets);
+  

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-08-04 Thread Ildus Kurbangaliev

On 08/04/2015 03:15 PM, Robert Haas wrote:

On Tue, Jul 28, 2015 at 3:28 PM, Heikki Linnakangashlinn...@iki.fi  wrote:

* The patch requires that the LWLOCK_INDIVIDUAL_NAMES array is kept in sync
with the list of individual locks in lwlock.h. Sooner or later someone will
add an LWLock and forget to update the names-array. That needs to be made
less error-prone, so that the names are maintained in the same place as the
#defines. Perhaps something like rmgrlist.h.

This is a good idea, but it's not easy to do in the style of
rmgrlist.h, because I don't believe there's any way to define a macro
that expands to a preprocessor directive.  Attached is a patch that
instead generates the list of macros from a text file, and also
generates an array inside lwlock.c with the lock names that gets used
by the Trace_lwlocks stuff where applicable.

Any objections to this solution to the problem?  If not, I'd like to
go ahead and push this much.  I can't test the Windows changes
locally, though, so it would be helpful if someone could check that
out.

In my latest patch I still have an array with names, but postgres will 
show a message
if somebody adds an individual LWLock and forgets to add its name. Code 
generation
is also a solution, and if commiters will support it I'll merge it to 
main patch.


--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-08-04 Thread Heikki Linnakangas

On 08/04/2015 03:15 PM, Robert Haas wrote:

On Tue, Jul 28, 2015 at 3:28 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

* The patch requires that the LWLOCK_INDIVIDUAL_NAMES array is kept in sync
with the list of individual locks in lwlock.h. Sooner or later someone will
add an LWLock and forget to update the names-array. That needs to be made
less error-prone, so that the names are maintained in the same place as the
#defines. Perhaps something like rmgrlist.h.


This is a good idea, but it's not easy to do in the style of
rmgrlist.h, because I don't believe there's any way to define a macro
that expands to a preprocessor directive.  Attached is a patch that
instead generates the list of macros from a text file, and also
generates an array inside lwlock.c with the lock names that gets used
by the Trace_lwlocks stuff where applicable.

Any objections to this solution to the problem?  If not, I'd like to
go ahead and push this much.  I can't test the Windows changes
locally, though, so it would be helpful if someone could check that
out.


A more low-tech solution would be to something like this in lwlocknames.c:

static char *MainLWLockNames[NUM_INDIVIDUAL_LWLOCKS];

/* Turn pointer into one of the LWLocks in main array into an index 
number */

#define NAME_LWLOCK(l, name) MainLWLockNames[l - MainLWLockArray)] = name

InitLWLockNames()
{
  NAME_LWLOCK(ShmemIndexLock, ShmemIndexLock);
  NAME_LWLOCK(OidGenLock, OidGenLock);
  ...
}

That would not be auto-generated, so you'd need to keep that list in 
sync with lwlock.h, but it would be much better than the original patch 
because if you forgot to add an entry in the names-array, the numbering 
of all the other locks would not go wrong. And you could have a runtime 
check that complains if there's an entry missing, like Ildus did in his 
latest patch.


I have no particular objection to your perl script either, though. I'll 
leave it up to you.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-08-04 Thread Robert Haas
On Tue, Jul 28, 2015 at 3:28 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
 * The patch requires that the LWLOCK_INDIVIDUAL_NAMES array is kept in sync
 with the list of individual locks in lwlock.h. Sooner or later someone will
 add an LWLock and forget to update the names-array. That needs to be made
 less error-prone, so that the names are maintained in the same place as the
 #defines. Perhaps something like rmgrlist.h.

This is a good idea, but it's not easy to do in the style of
rmgrlist.h, because I don't believe there's any way to define a macro
that expands to a preprocessor directive.  Attached is a patch that
instead generates the list of macros from a text file, and also
generates an array inside lwlock.c with the lock names that gets used
by the Trace_lwlocks stuff where applicable.

Any objections to this solution to the problem?  If not, I'd like to
go ahead and push this much.  I can't test the Windows changes
locally, though, so it would be helpful if someone could check that
out.

 * Instead of having LWLOCK_INDIVIDUAL_NAMES to name individual locks, how
 about just giving each one of them a separate tranche?

I don't think it's good to split things up to that degree;
standardizing on one name per fixed lwlock and one per tranche
otherwise seems like a good compromise to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 98b978f..d16ab10 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -106,7 +106,7 @@ endif
 endif # aix
 
 # Update the commonly used headers before building the subdirectories
-$(SUBDIRS:%=%-recursive): $(top_builddir)/src/include/parser/gram.h $(top_builddir)/src/include/catalog/schemapg.h $(top_builddir)/src/include/utils/fmgroids.h $(top_builddir)/src/include/utils/errcodes.h $(top_builddir)/src/include/utils/probes.h
+$(SUBDIRS:%=%-recursive): $(top_builddir)/src/include/parser/gram.h $(top_builddir)/src/include/catalog/schemapg.h $(top_builddir)/src/include/storage/lwlocknames.h $(top_builddir)/src/include/utils/fmgroids.h $(top_builddir)/src/include/utils/errcodes.h $(top_builddir)/src/include/utils/probes.h
 
 # run this unconditionally to avoid needing to know its dependencies here:
 submake-schemapg:
@@ -135,6 +135,9 @@ postgres.o: $(OBJS)
 parser/gram.h: parser/gram.y
 	$(MAKE) -C parser gram.h
 
+storage/lmgr/lwlocknames.h: storage/lmgr/generate-lwlocknames.pl storage/lmgr/lwlocknames.txt
+	$(MAKE) -C storage/lmgr lwlocknames.h
+
 utils/fmgroids.h: utils/Gen_fmgrtab.pl catalog/Catalog.pm $(top_srcdir)/src/include/catalog/pg_proc.h
 	$(MAKE) -C utils fmgroids.h
 
@@ -165,6 +168,11 @@ $(top_builddir)/src/include/catalog/schemapg.h: catalog/schemapg.h
 	  cd '$(dir $@)'  rm -f $(notdir $@)  \
 	  $(LN_S) $$prereqdir/$(notdir $) .
 
+$(top_builddir)/src/include/storage/lwlocknames.h: storage/lmgr/lwlocknames.h
+	prereqdir=`cd '$(dir $)' /dev/null  pwd`  \
+	  cd '$(dir $@)'  rm -f $(notdir $@)  \
+	  $(LN_S) $$prereqdir/$(notdir $) .
+
 $(top_builddir)/src/include/utils/errcodes.h: utils/errcodes.h
 	prereqdir=`cd '$(dir $)' /dev/null  pwd`  \
 	  cd '$(dir $@)'  rm -f $(notdir $@)  \
@@ -192,6 +200,7 @@ distprep:
 	$(MAKE) -C bootstrap	bootparse.c bootscanner.c
 	$(MAKE) -C catalog	schemapg.h postgres.bki postgres.description postgres.shdescription
 	$(MAKE) -C replication	repl_gram.c repl_scanner.c
+	$(MAKE) -C storage	lwlocknames.h
 	$(MAKE) -C utils	fmgrtab.c fmgroids.h errcodes.h
 	$(MAKE) -C utils/misc	guc-file.c
 	$(MAKE) -C utils/sort	qsort_tuple.c
@@ -307,6 +316,7 @@ maintainer-clean: distclean
 	  catalog/postgres.shdescription \
 	  replication/repl_gram.c \
 	  replication/repl_scanner.c \
+	  storage/lmgr/lwlocknames.h \
 	  utils/fmgroids.h \
 	  utils/fmgrtab.c \
 	  utils/errcodes.h \
diff --git a/src/backend/storage/lmgr/.gitignore b/src/backend/storage/lmgr/.gitignore
new file mode 100644
index 000..9355cae
--- /dev/null
+++ b/src/backend/storage/lmgr/.gitignore
@@ -0,0 +1,2 @@
+/lwlocknames.c
+/lwlocknames.h
diff --git a/src/backend/storage/lmgr/Makefile b/src/backend/storage/lmgr/Makefile
index e12a854..e941f2d 100644
--- a/src/backend/storage/lmgr/Makefile
+++ b/src/backend/storage/lmgr/Makefile
@@ -24,8 +24,17 @@ s_lock_test: s_lock.c $(top_builddir)/src/port/libpgport.a
 	$(CC) $(CPPFLAGS) $(CFLAGS) -DS_LOCK_TEST=1 $(srcdir)/s_lock.c \
 		$(TASPATH) -L $(top_builddir)/src/port -lpgport -o s_lock_test
 
+# see explanation in ../../parser/Makefile
+lwlocknames.c: lwlocknames.h ;
+
+lwlocknames.h: $(top_srcdir)/src/backend/storage/lmgr/lwlocknames.txt generate-lwlocknames.pl
+	$(PERL) $(srcdir)/generate-lwlocknames.pl $
+
 check: s_lock_test
 	./s_lock_test
 
-clean distclean maintainer-clean:
+clean distclean:
 	rm -f s_lock_test
+
+maintainer-clean: clean
+	rm -f lwlocknames.h
diff --git a/src/backend/storage/lmgr/generate-lwlocknames.pl b/src/backend/storage/lmgr/generate-lwlocknames.pl
new 

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-08-04 Thread Robert Haas
On Tue, Aug 4, 2015 at 4:37 PM, Ildus Kurbangaliev
i.kurbangal...@postgrespro.ru wrote:
 A new version of the patch. I used your idea with macros, and with tranches 
 that
 allowed us to remove array with names (they can be written directly to the 
 corresponding
 tranche).

You seem not to have addressed a few of the points I brought up here:

http://www.postgresql.org/message-id/CA+TgmoaGqhah0VTamsfaOMaE9uOrCPYSXN8hCS9=wirupjs...@mail.gmail.com

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-08-04 Thread Robert Haas
On Tue, Aug 4, 2015 at 4:47 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Aug 4, 2015 at 4:37 PM, Ildus Kurbangaliev
 i.kurbangal...@postgrespro.ru wrote:
 A new version of the patch. I used your idea with macros, and with tranches 
 that
 allowed us to remove array with names (they can be written directly to the 
 corresponding
 tranche).

 You seem not to have addressed a few of the points I brought up here:

 http://www.postgresql.org/message-id/CA+TgmoaGqhah0VTamsfaOMaE9uOrCPYSXN8hCS9=wirupjs...@mail.gmail.com

More generally, I'd like to stop smashing all the things that need to
be done here into one patch.  We need to make some changes, such as
the one I proposed earlier today, to make it easier to properly
identify locks.  Let's talk about how to do that and agree on the
details.  Then, once that's done, let's do the main part of the work
afterwards, in a separate commit.  We're running through patch
versions at light speed here, but I'm not sure we're really building
consensus around how to do things.  The actual technical work here
isn't really the problem; that part is easy.  The hard part is
agreeing on the details of how it should work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-08-04 Thread Alvaro Herrera
Ildus Kurbangaliev wrote:

 A new version of the patch. I used your idea with macros, and with tranches 
 that
 allowed us to remove array with names (they can be written directly to the 
 corresponding
 tranche).

Just a bystander here, I haven't reviewed this patch at all, but I have
two questions,

1. have you tested this under -DEXEC_BACKEND ?  I wonder if those
initializations are going to work on Windows.

2. why keep the SLRU control locks as individual locks?  Surely we could
put them in the SlruCtl struct and get rid of a few individual lwlocks?

Also, I wonder if we shouldn't be doing this in two parts, one that
changes the underlying lwlock structure and another one to change
pg_stat_activity.


We have CppAsString() in c.h IIRC, which we use instead of # (we do use
## in a few places though).  I wonder if that stuff has any value
anymore.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-08-04 Thread Ildus Kurbangaliev

 On Aug 4, 2015, at 4:54 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
 
 On 08/04/2015 03:15 PM, Robert Haas wrote:
 On Tue, Jul 28, 2015 at 3:28 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
 * The patch requires that the LWLOCK_INDIVIDUAL_NAMES array is kept in sync
 with the list of individual locks in lwlock.h. Sooner or later someone will
 add an LWLock and forget to update the names-array. That needs to be made
 less error-prone, so that the names are maintained in the same place as the
 #defines. Perhaps something like rmgrlist.h.
 
 This is a good idea, but it's not easy to do in the style of
 rmgrlist.h, because I don't believe there's any way to define a macro
 that expands to a preprocessor directive.  Attached is a patch that
 instead generates the list of macros from a text file, and also
 generates an array inside lwlock.c with the lock names that gets used
 by the Trace_lwlocks stuff where applicable.
 
 Any objections to this solution to the problem?  If not, I'd like to
 go ahead and push this much.  I can't test the Windows changes
 locally, though, so it would be helpful if someone could check that
 out.
 
 A more low-tech solution would be to something like this in lwlocknames.c:
 
 static char *MainLWLockNames[NUM_INDIVIDUAL_LWLOCKS];
 
 /* Turn pointer into one of the LWLocks in main array into an index number */
 #define NAME_LWLOCK(l, name) MainLWLockNames[l - MainLWLockArray)] = name
 
 InitLWLockNames()
 {
  NAME_LWLOCK(ShmemIndexLock, ShmemIndexLock);
  NAME_LWLOCK(OidGenLock, OidGenLock);
  ...
 }
 
 That would not be auto-generated, so you'd need to keep that list in sync 
 with lwlock.h, but it would be much better than the original patch because if 
 you forgot to add an entry in the names-array, the numbering of all the other 
 locks would not go wrong. And you could have a runtime check that complains 
 if there's an entry missing, like Ildus did in his latest patch.
 
 I have no particular objection to your perl script either, though. I'll leave 
 it up to you.
 
 - Heikki
 

A new version of the patch. I used your idea with macros, and with tranches that
allowed us to remove array with names (they can be written directly to the 
corresponding
tranche).


Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



extend_pg_stat_activity_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-08-03 Thread Ildus Kurbangaliev

On 07/28/2015 10:28 PM, Heikki Linnakangas wrote:

On 07/27/2015 01:20 PM, Ildus Kurbangaliev wrote:

Hello.
In the attached patch I've made a refactoring for tranches.
The prefix for them was extended,  and I've did a split of LWLockAssign
to two
functions (one with tranche and second for user defined LWLocks).


This needs some work in order to be maintainable:

* The patch requires that the LWLOCK_INDIVIDUAL_NAMES array is kept in 
sync with the list of individual locks in lwlock.h. Sooner or later 
someone will add an LWLock and forget to update the names-array. That 
needs to be made less error-prone, so that the names are maintained in 
the same place as the #defines. Perhaps something like rmgrlist.h.


* The base tranches are a bit funny. They all have the same 
array_base, pointing to MainLWLockArray. If there are e.g. 5 clog 
buffer locks, I would expect the T_NAME() to return ClogBufferLocks 
for all of them, and T_ID() to return numbers between 0-4. But in 
reality, T_ID() will return something like 55-59.


Instead of passing a tranche-id to LWLockAssign(), I think it would be 
more clear to have a new function to allocate a contiguous block of 
lwlocks as a new tranche. It could then set the base correctly.


* Instead of having LWLOCK_INDIVIDUAL_NAMES to name individual 
locks, how about just giving each one of them a separate tranche?


* User manual needs to be updated to explain the new column in 
pg_stat_activity.


- Heikki



Hello. Thanks for review. I attached new version of patch.

It adds new field in pg_stat_activity that shows current wait in backend.

I've did a some refactoring LWLocks tranche mechanism. In lwlock.c only
invididual and user defined LWLocks is creating, other LWLocks created by
modules who need them. I think that is more logical (user know about module,
not module about of all users). It also simplifies LWLocks acquirement.

Now each individual LWLock and other groups of LWLocks have their 
tranche, and can
be easily identified. If somebody will add new individual LWLock and 
forget to add
its name, postgres will show a message. Individual LWLocks still 
allocated in

one array but tranches for them point to particular index in main array.

Sample:

b1=# select pid, wait_event from pg_stat_activity; \watch 1

 pid  |   wait_event
--+-
 7722 | Storage: READ
 7653 |
 7723 | Network: WRITE
 7725 | Network: READ
 7727 | Locks: Transaction
 7731 | Storage: READ
 7734 | Network: READ
 7735 | Storage: READ
 7739 | LWLocks: WALInsertLocks
 7738 | Locks: Transaction
 7740 | LWLocks: BufferMgrLocks
 7741 | Network: READ
 7742 | Network: READ
 7743 | Locks: Transaction


--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e64b7ef..2e4e67e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -630,6 +630,11 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
  entryTrue if this backend is currently waiting on a lock/entry
 /row
 row
+ entrystructfieldwait_event//entry
+ entrytypetext//entry
+ entryName of a current wait event in backend/entry
+/row
+row
  entrystructfieldstate//entry
  entrytypetext//entry
  entryCurrent overall state of this backend.
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 3a58f1e..10c25cf 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -457,7 +457,8 @@ CLOGShmemInit(void)
 {
 	ClogCtl-PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(ClogCtl, CLOG Ctl, CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-  CLogControlLock, pg_clog);
+  CLogControlLock, pg_clog,
+  CLogBufferLocks);
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 5ad35c0..dd7578f 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -466,7 +466,8 @@ CommitTsShmemInit(void)
 
 	CommitTsCtl-PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, CommitTs Ctl, CommitTsShmemBuffers(), 0,
-  CommitTsControlLock, pg_commit_ts);
+  CommitTsControlLock, pg_commit_ts,
+  CommitTSBufferLocks);
 
 	commitTsShared = ShmemInitStruct(CommitTs shared,
 	 sizeof(CommitTimestampShared),
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 1933a87..b905c59 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1842,10 +1842,12 @@ MultiXactShmemInit(void)
 
 	SimpleLruInit(MultiXactOffsetCtl,
   MultiXactOffset Ctl, NUM_MXACTOFFSET_BUFFERS, 0,
-  MultiXactOffsetControlLock, pg_multixact/offsets);
+  MultiXactOffsetControlLock, pg_multixact/offsets,
+  MultiXactOffsetBufferLocks);
 	SimpleLruInit(MultiXactMemberCtl,
  

  1   2   >