Re: [HACKERS] Additional LWLOCK_STATS statistics

2015-12-21 Thread Michael Paquier
On Mon, Dec 21, 2015 at 4:50 PM, Jesper Pedersen
 wrote:
> On 12/18/2015 01:16 PM, Robert Haas wrote:
>>
>> Is this just for informational purposes, or is this something you are
>> looking to have committed?  I originally thought the former, but now
>> I'm wondering if I misinterpreted your intent.  I have a hard time
>> getting excited about committing something that would, unless I'm
>> missing something, pretty drastically increase the overhead of running
>> with LWLOCK_STATS...
>>
>
> Yeah, so unless other people using LWLOCK_STATS find the additional
> information of use (w/ the extra overhead), I think we can mark it as
> "Returned with feedback" or "Rejected".

Marked as rejected for this CF then, log overhead is not something to
ignore. There has been a fair amount of infrastructure work done btw
thanks to your impulse.

> Alternative, I can redo the patch requiring an additional #define - f.ex.
> LWLOCK_STATS_QUEUE_SIZES

Feel free to do so if you wish, that may be interesting to see what this gives.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional LWLOCK_STATS statistics

2015-12-20 Thread Jesper Pedersen

On 12/18/2015 01:16 PM, Robert Haas wrote:

Is this just for informational purposes, or is this something you are
looking to have committed?  I originally thought the former, but now
I'm wondering if I misinterpreted your intent.  I have a hard time
getting excited about committing something that would, unless I'm
missing something, pretty drastically increase the overhead of running
with LWLOCK_STATS...



Yeah, so unless other people using LWLOCK_STATS find the additional 
information of use (w/ the extra overhead), I think we can mark it as 
"Returned with feedback" or "Rejected".


Alternative, I can redo the patch requiring an additional #define - 
f.ex. LWLOCK_STATS_QUEUE_SIZES


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional LWLOCK_STATS statistics

2015-12-18 Thread Robert Haas
On Wed, Dec 16, 2015 at 5:02 AM, Jesper Pedersen
 wrote:
> On 09/16/2015 12:44 PM, Jesper Pedersen wrote:
>>
>> So, I think there is some value in keeping this information separate.
>>
>
> Just a rebased patch after the excellent LWLockTranche work.
>
> And a new sample report with -c/-j 200 -M prepared.

Is this just for informational purposes, or is this something you are
looking to have committed?  I originally thought the former, but now
I'm wondering if I misinterpreted your intent.  I have a hard time
getting excited about committing something that would, unless I'm
missing something, pretty drastically increase the overhead of running
with LWLOCK_STATS...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional LWLOCK_STATS statistics

2015-09-16 Thread Jesper Pedersen

On 09/15/2015 03:51 PM, Jesper Pedersen wrote:

It
would be nice to get a better sense of how *long* we block on various
locks.  It's hard to tell whether some other lock might be have fewer
blocking events but for a much longer average duration.





I did a run with the attached patch, recording the time from 
LWLockQueueSelf() until the lock is required.


Doing block_time / block_count basically only shows "main 0" -- its 
called "unassigned:0"; it also shows up in the max exclusive report. 
Where it is coming from is another question, since it shouldn't be in use.


Due to the overhead of gettimeofday() I think that _v1 is better to 
consider.


Best regards,
 Jesper

*** /tmp/K1hW5V_lwlock.c	2015-09-16 09:53:26.682803203 -0400
--- src/backend/storage/lmgr/lwlock.c	2015-09-16 08:59:11.232712749 -0400
***
*** 92,97 
--- 92,98 
  #include "utils/memutils.h"
  
  #ifdef LWLOCK_STATS
+ #include 
  #include "utils/hsearch.h"
  #endif
  
***
*** 163,176 
--- 164,190 
  {
  	lwlock_stats_key key;
  	int			sh_acquire_count;
+ 	int			sh_acquire_max;
  	int			ex_acquire_count;
+ 	int			ex_acquire_max;
  	int			block_count;
+ 	double  block_time;
+ 	double  timestamp;
  	int			dequeue_self_count;
  	int			spin_delay_count;
+ 	int			max_waiters;
  }	lwlock_stats;
  
  static HTAB *lwlock_stats_htab;
  static lwlock_stats lwlock_stats_dummy;
+ 
+ static double
+ GetTimeInMillis()
+ {
+ 	struct timeval t;
+ 	gettimeofday(, NULL);
+ 	return (t.tv_sec + (t.tv_usec / 100.0)) * 1000.0;
+ }
  #endif
  
  #ifdef LOCK_DEBUG
***
*** 297,308 
  
  	while ((lwstats = (lwlock_stats *) hash_seq_search()) != NULL)
  	{
! 		fprintf(stderr,
! "PID %d lwlock %s %d: shacq %u exacq %u blk %u spindelay %u dequeue self %u\n",
! MyProcPid, LWLockTrancheArray[lwstats->key.tranche]->name,
! lwstats->key.instance, lwstats->sh_acquire_count,
! lwstats->ex_acquire_count, lwstats->block_count,
! lwstats->spin_delay_count, lwstats->dequeue_self_count);
  	}
  
  	LWLockRelease([0].lock);
--- 311,332 
  
  	while ((lwstats = (lwlock_stats *) hash_seq_search()) != NULL)
  	{
! 		if (lwstats->key.tranche == 0 && lwstats->key.instance < NUM_INDIVIDUAL_LWLOCKS)
! 			fprintf(stderr,
! 	"PID %d lwlock %s: shacq %u shmax %u exacq %u exmax %u blk %u blktime %u spindelay %u dequeue self %u maxw %u\n",
! 	MyProcPid, MainLWLockNames[lwstats->key.instance],
! 	lwstats->sh_acquire_count, lwstats->sh_acquire_max,
! 	lwstats->ex_acquire_count, lwstats->ex_acquire_max,
! 	lwstats->block_count, (unsigned int)lwstats->block_time, lwstats->spin_delay_count, lwstats->dequeue_self_count,
! 	lwstats->max_waiters);
! 		else
! 			fprintf(stderr,
! 	"PID %d lwlock %s %d: shacq %u shmax %u exacq %u exmax %u blk %u blktime %u spindelay %u dequeue self %u maxw %u\n",
! 	MyProcPid, LWLockTrancheArray[lwstats->key.tranche]->name,
! 	lwstats->key.instance, lwstats->sh_acquire_count, lwstats->sh_acquire_max,
! 	lwstats->ex_acquire_count, lwstats->ex_acquire_max,
! 	lwstats->block_count, (unsigned int)lwstats->block_time, lwstats->spin_delay_count, lwstats->dequeue_self_count,
! 	lwstats->max_waiters);
  	}
  
  	LWLockRelease([0].lock);
***
*** 330,339 
--- 354,368 
  	if (!found)
  	{
  		lwstats->sh_acquire_count = 0;
+ 		lwstats->sh_acquire_max = 0;
  		lwstats->ex_acquire_count = 0;
+ 		lwstats->ex_acquire_max = 0;
  		lwstats->block_count = 0;
+ 		lwstats->block_time = 0.0;
+ 		lwstats->timestamp = 0.0;
  		lwstats->dequeue_self_count = 0;
  		lwstats->spin_delay_count = 0;
+ 		lwstats->max_waiters = 0;
  	}
  	return lwstats;
  }
***
*** 774,779 
--- 803,811 
  LWLockQueueSelf(LWLock *lock, LWLockMode mode)
  {
  #ifdef LWLOCK_STATS
+ 	bool include;
+ 	int counter, size;
+ 	dlist_iter iter;
  	lwlock_stats *lwstats;
  
  	lwstats = get_lwlock_stats_entry(lock);
***
*** 791,797 
--- 823,870 
  		elog(PANIC, "queueing for lock while waiting on another one");
  
  #ifdef LWLOCK_STATS
+ 	lwstats->timestamp = GetTimeInMillis();
  	lwstats->spin_delay_count += SpinLockAcquire(>mutex);
+ 
+ 	/*
+ 	 * We scan the list of waiters from the back in order to find
+ 	 * out how many of the same lock type are waiting for a lock.
+ 	 * Similar types have the potential to be groupped together.
+ 	 *
+ 	 * We also count the number of waiters, including ourself.
+ 	 */
+ 	include = true;
+ 	size = 1;
+ 	counter = 1;
+ 
+ 	dlist_reverse_foreach(iter, >waiters)
+ 	{
+ 		if (include)
+ 		{
+ 			PGPROC	   *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
+ 
+ 			if (waiter->lwWaitMode == mode)
+ counter += 1;
+ 			else
+ include = false;
+ 		}
+ 		
+ 		size += 1;
+ 	}
+ 
+ 	if (mode == LW_EXCLUSIVE || mode == LW_WAIT_UNTIL_FREE)
+ 	{
+ 		if (counter > lwstats->ex_acquire_max)
+ 			lwstats->ex_acquire_max = counter;
+ 	}
+ 	else if (mode == LW_SHARED)
+ 	{
+ 		if (counter > 

Re: [HACKERS] Additional LWLOCK_STATS statistics

2015-09-16 Thread Jesper Pedersen

On 09/16/2015 10:13 AM, Jesper Pedersen wrote:

On 09/15/2015 03:51 PM, Jesper Pedersen wrote:

It
would be nice to get a better sense of how *long* we block on various
locks.  It's hard to tell whether some other lock might be have fewer
blocking events but for a much longer average duration.





I did a run with the attached patch, recording the time from
LWLockQueueSelf() until the lock is required.



Note, that the time it recorded before the spin lock is acquired, so not 
ideal.



Doing block_time / block_count basically only shows "main 0" -- its
called "unassigned:0"; it also shows up in the max exclusive report.
Where it is coming from is another question, since it shouldn't be in use.



Likely from LWLOCK_STATS' own lwlock.c::print_lwlock_stats, which would 
make sense.


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional LWLOCK_STATS statistics

2015-09-16 Thread Jesper Pedersen

On 09/16/2015 10:25 AM, Jesper Pedersen wrote:

Likely from LWLOCK_STATS' own lwlock.c::print_lwlock_stats, which would
make sense.



Version 3 attached, which ignores entries from MainLWLockArray[0].

Best regards,
 Jesper

*** /tmp/NTwtmh_lwlock.c	2015-09-16 10:34:02.955957192 -0400
--- src/backend/storage/lmgr/lwlock.c	2015-09-16 10:32:18.088843413 -0400
***
*** 163,172 
--- 163,175 
  {
  	lwlock_stats_key key;
  	int			sh_acquire_count;
+ 	int			sh_acquire_max;
  	int			ex_acquire_count;
+ 	int			ex_acquire_max;
  	int			block_count;
  	int			dequeue_self_count;
  	int			spin_delay_count;
+ 	int			max_waiters;
  }	lwlock_stats;
  
  static HTAB *lwlock_stats_htab;
***
*** 297,308 
  
  	while ((lwstats = (lwlock_stats *) hash_seq_search()) != NULL)
  	{
! 		fprintf(stderr,
! "PID %d lwlock %s %d: shacq %u exacq %u blk %u spindelay %u dequeue self %u\n",
! MyProcPid, LWLockTrancheArray[lwstats->key.tranche]->name,
! lwstats->key.instance, lwstats->sh_acquire_count,
! lwstats->ex_acquire_count, lwstats->block_count,
! lwstats->spin_delay_count, lwstats->dequeue_self_count);
  	}
  
  	LWLockRelease([0].lock);
--- 300,325 
  
  	while ((lwstats = (lwlock_stats *) hash_seq_search()) != NULL)
  	{
! 		if (lwstats->key.tranche == 0 && lwstats->key.instance < NUM_INDIVIDUAL_LWLOCKS)
! 		{
! 			/* We ignore information from MainLWLockArray[0] since it isn't in real use */
! 			if (lwstats->key.instance != 0)
! fprintf(stderr,
! 		"PID %d lwlock %s: shacq %u shmax %u exacq %u exmax %u blk %u spindelay %u dequeue self %u maxw %u\n",
! 		MyProcPid, MainLWLockNames[lwstats->key.instance],
! 		lwstats->sh_acquire_count, lwstats->sh_acquire_max,
! 		lwstats->ex_acquire_count, lwstats->ex_acquire_max,
! 		lwstats->block_count, lwstats->spin_delay_count, lwstats->dequeue_self_count,
! 		lwstats->max_waiters);
! 		}
! 		else
! 			fprintf(stderr,
! 	"PID %d lwlock %s %d: shacq %u shmax %u exacq %u exmax %u blk %u spindelay %u dequeue self %u maxw %u\n",
! 	MyProcPid, LWLockTrancheArray[lwstats->key.tranche]->name,
! 	lwstats->key.instance, lwstats->sh_acquire_count, lwstats->sh_acquire_max,
! 	lwstats->ex_acquire_count, lwstats->ex_acquire_max,
! 	lwstats->block_count, lwstats->spin_delay_count, lwstats->dequeue_self_count,
! 	lwstats->max_waiters);
  	}
  
  	LWLockRelease([0].lock);
***
*** 330,339 
--- 347,359 
  	if (!found)
  	{
  		lwstats->sh_acquire_count = 0;
+ 		lwstats->sh_acquire_max = 0;
  		lwstats->ex_acquire_count = 0;
+ 		lwstats->ex_acquire_max = 0;
  		lwstats->block_count = 0;
  		lwstats->dequeue_self_count = 0;
  		lwstats->spin_delay_count = 0;
+ 		lwstats->max_waiters = 0;
  	}
  	return lwstats;
  }
***
*** 774,779 
--- 794,802 
  LWLockQueueSelf(LWLock *lock, LWLockMode mode)
  {
  #ifdef LWLOCK_STATS
+ 	bool include;
+ 	int counter, size;
+ 	dlist_iter iter;
  	lwlock_stats *lwstats;
  
  	lwstats = get_lwlock_stats_entry(lock);
***
*** 792,797 
--- 815,860 
  
  #ifdef LWLOCK_STATS
  	lwstats->spin_delay_count += SpinLockAcquire(>mutex);
+ 
+ 	/*
+ 	 * We scan the list of waiters from the back in order to find
+ 	 * out how many of the same lock type are waiting for a lock.
+ 	 * Similar types have the potential to be groupped together.
+ 	 *
+ 	 * We also count the number of waiters, including ourself.
+ 	 */
+ 	include = true;
+ 	size = 1;
+ 	counter = 1;
+ 
+ 	dlist_reverse_foreach(iter, >waiters)
+ 	{
+ 		if (include)
+ 		{
+ 			PGPROC	   *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
+ 
+ 			if (waiter->lwWaitMode == mode)
+ counter += 1;
+ 			else
+ include = false;
+ 		}
+ 		
+ 		size += 1;
+ 	}
+ 
+ 	if (mode == LW_EXCLUSIVE || mode == LW_WAIT_UNTIL_FREE)
+ 	{
+ 		if (counter > lwstats->ex_acquire_max)
+ 			lwstats->ex_acquire_max = counter;
+ 	}
+ 	else if (mode == LW_SHARED)
+ 	{
+ 		if (counter > lwstats->sh_acquire_max)
+ 			lwstats->sh_acquire_max = counter;
+ 	}
+ 
+ 	if (size > lwstats->max_waiters)
+ 		lwstats->max_waiters = size;
  #else
  	SpinLockAcquire(>mutex);
  #endif
***
*** 943,951 
--- 1006,1022 
  #ifdef LWLOCK_STATS
  	/* Count lock acquisition attempts */
  	if (mode == LW_EXCLUSIVE)
+ 	{
  		lwstats->ex_acquire_count++;
+ 		if (lwstats->ex_acquire_max == 0)
+ 			lwstats->ex_acquire_max = 1;
+ 	}
  	else
+ 	{
  		lwstats->sh_acquire_count++;
+ 		if (lwstats->sh_acquire_max == 0)
+ 			lwstats->sh_acquire_max = 1;
+ 	}
  #endif   /* LWLOCK_STATS */
  
  	/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional LWLOCK_STATS statistics

2015-09-16 Thread Jesper Pedersen

Hi,

On 09/16/2015 12:26 PM, Andres Freund wrote:

On 2015-09-16 10:37:43 -0400, Jesper Pedersen wrote:

   #ifdef LWLOCK_STATS
lwstats->spin_delay_count += SpinLockAcquire(>mutex);
+
+   /*
+* We scan the list of waiters from the back in order to find
+* out how many of the same lock type are waiting for a lock.
+* Similar types have the potential to be groupped together.
+*


Except for LW_WAIT_UNTIL_FREE there shouldn't be much in terms of
grouping? Can't you instead iterate and count
counter[waiter->lwWaitMode]++ or so?



The sample report (-c/-j 80) shows in the exclusive report that 
CLogControlLock has 79 in series, XidGenLock has 80 and ProcArrayLock 
only 7. For shared CLogControlLock has 63 and ProcArrayLock has 74 
(which is also its max queue size).


So, I think there is some value in keeping this information separate.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional LWLOCK_STATS statistics

2015-09-16 Thread Andres Freund
Hi,

On 2015-09-16 10:37:43 -0400, Jesper Pedersen wrote:
>   #ifdef LWLOCK_STATS
>   lwstats->spin_delay_count += SpinLockAcquire(>mutex);
> + 
> + /*
> +  * We scan the list of waiters from the back in order to find
> +  * out how many of the same lock type are waiting for a lock.
> +  * Similar types have the potential to be groupped together.
> +  *

Except for LW_WAIT_UNTIL_FREE there shouldn't be much in terms of
grouping? Can't you instead iterate and count
counter[waiter->lwWaitMode]++ or so?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional LWLOCK_STATS statistics

2015-09-16 Thread Robert Haas
On Wed, Sep 16, 2015 at 10:13 AM, Jesper Pedersen
 wrote:
> Doing block_time / block_count basically only shows "main 0" -- its called
> "unassigned:0"; it also shows up in the max exclusive report. Where it is
> coming from is another question, since it shouldn't be in use.

LWLOCK_STATS itself uses that for something, I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional LWLOCK_STATS statistics

2015-09-15 Thread Amit Kapila
On Wed, Sep 16, 2015 at 1:21 AM, Jesper Pedersen 
wrote:
>
> On 09/15/2015 03:42 PM, Robert Haas wrote:
>>
>> I haven't really, just the email.  But it seems like a neat concept.
>> So if I understand this correctly:
>>
>> 74.05% of spin delays are attributable to CLogControLock, 20.01% to
>> ProcArrayLock, and 3.39% to XidGenLock.  Incredibly, the queue length
>> reaches the number of backends (80) for both CLogControlLock and
>> XidGenLock, but for ProcArrayLock it "only" reaches a length of 75.
>>
>
> 74, as the "real" information is above the "call stack". The spin delay
report is filtered on > 0 - so only LWLock's that has any spin delay are
included.
>
> Only the weight report shows all locks.
>
>> This seems to suggest that relieving pressure on CLogControlLock would
>> be very beneficial, but I'm not sure what else to make of it.
>
>
> I have done some runs with Amit's CLogControlLock patch, but currently it
doesn't show any improvements. I'm trying to figure out why.
>

Thanks for testing the patch. I think even if you are not getting
improvement, feel free to report about it with details. I can also
look into it that why it is not showing improvement.

Two important points to care about while testing that patch are that
data should fit in shared_buffers and synchronous_commit should be
on, for other kind of workloads it might or might not give much benefit.




With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Additional LWLOCK_STATS statistics

2015-09-15 Thread Robert Haas
On Tue, Sep 15, 2015 at 10:27 AM, Jesper Pedersen
 wrote:
> Hi,
>
> I have been using the attached patch to look at how each LWLock relates to
> each other in various types of runs.
>
> The patch adds the following fields to a LWLOCK_STATS build:
>
>  sh_acquire_max (int):
>
>The maximum shared locks in series for the lock
>
>  ex_acquire_max (int):
>
>The maximum exclusive locks in series for the lock
>
>  max_waiters (int):
>
>The maximum numbers of waiters
>
> Also attached is a sample report using FlameGraphs from a pgbench run using
>
>  -c 80 -j 80 -T 300
>
> See README for additional details.
>
> If there is an interest I'll add the patch to the next CommitFest.
>
> Thanks for considering, and any feedback is most welcomed.

Seems neat, but I can't understand how to read the flame graphs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional LWLOCK_STATS statistics

2015-09-15 Thread Robert Haas
On Tue, Sep 15, 2015 at 3:30 PM, Jesper Pedersen
 wrote:
> X-axis is sort of "up in the air" with flame graphs -- similar call stacks
> are grouped together, and here it is the queue size.
>
> Y-axis is the lock queue size -- e.g. CLogControlLock is "max'ed" out, since
> there is a queue size of 80 with the -c/-j 80 run.

Oh, OK, I think I sorta get it now.  I've seen flamegraphs before, but of CPU.

> Thanks for looking at the patch.

I haven't really, just the email.  But it seems like a neat concept.
So if I understand this correctly:

74.05% of spin delays are attributable to CLogControLock, 20.01% to
ProcArrayLock, and 3.39% to XidGenLock.  Incredibly, the queue length
reaches the number of backends (80) for both CLogControlLock and
XidGenLock, but for ProcArrayLock it "only" reaches a length of 75.

44.46% of blocking is attributable to CLogControlLock and 1.34% to
XidGenLock.  32.62% is due to ProcArrayLock.  The WALInsertLocks
account for 2.34% each, so 2.34% * 8 = 18.72%.  The WALInsertLock have
a maximum queue depths between 28 and 42.

This seems to suggest that relieving pressure on CLogControlLock would
be very beneficial, but I'm not sure what else to make of it.  It
would be nice to get a better sense of how *long* we block on various
locks.  It's hard to tell whether some other lock might be have fewer
blocking events but for a much longer average duration.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional LWLOCK_STATS statistics

2015-09-15 Thread Jesper Pedersen

On 09/15/2015 03:42 PM, Robert Haas wrote:

I haven't really, just the email.  But it seems like a neat concept.
So if I understand this correctly:

74.05% of spin delays are attributable to CLogControLock, 20.01% to
ProcArrayLock, and 3.39% to XidGenLock.  Incredibly, the queue length
reaches the number of backends (80) for both CLogControlLock and
XidGenLock, but for ProcArrayLock it "only" reaches a length of 75.



74, as the "real" information is above the "call stack". The spin delay 
report is filtered on > 0 - so only LWLock's that has any spin delay are 
included.


Only the weight report shows all locks.


This seems to suggest that relieving pressure on CLogControlLock would
be very beneficial, but I'm not sure what else to make of it.


I have done some runs with Amit's CLogControlLock patch, but currently 
it doesn't show any improvements. I'm trying to figure out why.



It
would be nice to get a better sense of how *long* we block on various
locks.  It's hard to tell whether some other lock might be have fewer
blocking events but for a much longer average duration.



Yes, that would be interesting.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional LWLOCK_STATS statistics

2015-09-15 Thread Jesper Pedersen

On 09/15/2015 03:11 PM, Robert Haas wrote:

If there is an interest I'll add the patch to the next CommitFest.

Thanks for considering, and any feedback is most welcomed.


Seems neat, but I can't understand how to read the flame graphs.



X-axis is sort of "up in the air" with flame graphs -- similar call 
stacks are grouped together, and here it is the queue size.


Y-axis is the lock queue size -- e.g. CLogControlLock is "max'ed" out, 
since there is a queue size of 80 with the -c/-j 80 run.


The width of each column shows how the locks are "distributed" within 
the report. Mouse over on the lock name gives the % of the entire 
report, f.ex. CLogControlLock has 74% of spin delays. Note, that some of 
the reports are filtered in order to eliminate "unnecessary" information 
-- see README.


You would need to zoom into some of the information - left click - and 
the reset the zoom afterwards.


I can change the reports if something is more helpful.

Flame graphs for CPU profiling are more common, like

https://dl.dropboxusercontent.com/u/99317937/postgresql/96S-20150808-r30t-17353.svg

Thanks for looking at the patch.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers