subject:"\[HACKERS\] Add min and max execute statement time in pg_stat


On 21/01/15 17:32, Andrew Dunstan wrote:


On 01/21/2015 11:21 AM, Arne Scheffer wrote:




Why is it a bad thing to call the column stddev_samp analog to the
aggregate function or make a note in the documentation, that the
sample stddev is used to compute the solution?


...

But I will add a note to the documentation, that seems reasonable.



I agree this is worth mentioning in the doc.

In any case here some review from me:

We definitely want this feature, I wished to have this info many times.

The patch has couple of issues though:

The 1.2 to 1.3 upgrade file is named pg_stat_statements--1.2-1.3.sql and 
should be renamed to pg_stat_statements--1.2--1.3.sql (two dashes).


There is new sqrtd function for fast sqrt calculation, which I think is 
a good idea as it will reduce the overhead of the additional calculation 
and this is not something where little loss of precision would matter 
too much. The problem is that the new function is actually not used 
anywhere in the code, I only see use of plain sqrt.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-02-16 Thread Peter Geoghegan

On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com wrote:
 We definitely want this feature, I wished to have this info many times.

I would still like to see a benchmark.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement


On 17/02/15 01:57, Peter Geoghegan wrote:

On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com wrote:

We definitely want this feature, I wished to have this info many times.


I would still like to see a benchmark.




Average of 3 runs of read-only pgbench on my system all with 
pg_stat_statement activated:

HEAD:  20631
SQRT:  20533
SQRTD: 20592


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement


On 17/02/15 03:03, Andrew Dunstan wrote:


On 02/16/2015 08:57 PM, Andrew Dunstan wrote:


On 02/16/2015 08:48 PM, Petr Jelinek wrote:

On 17/02/15 01:57, Peter Geoghegan wrote:

On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com
wrote:

We definitely want this feature, I wished to have this info many
times.


I would still like to see a benchmark.




Average of 3 runs of read-only pgbench on my system all with
pg_stat_statement activated:
HEAD:  20631
SQRT:  20533
SQRTD: 20592





So using sqrtd the cost is 0.18%. I think that's acceptable.



Actually, sqrt/sqrtd is not called in accumulating the stats, only in
the reporting function. So it looks like the difference here should be
noise. Maybe we need some longer runs.



Yes there are variations between individual runs so it might be really 
just that, I can leave it running for much longer time tomorrow.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-02-16 Thread Andrew Dunstan



On 02/16/2015 08:57 PM, Andrew Dunstan wrote:


On 02/16/2015 08:48 PM, Petr Jelinek wrote:

On 17/02/15 01:57, Peter Geoghegan wrote:
On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com 
wrote:
We definitely want this feature, I wished to have this info many 
times.


I would still like to see a benchmark.




Average of 3 runs of read-only pgbench on my system all with 
pg_stat_statement activated:

HEAD:  20631
SQRT:  20533
SQRTD: 20592





So using sqrtd the cost is 0.18%. I think that's acceptable.



Actually, sqrt/sqrtd is not called in accumulating the stats, only in 
the reporting function. So it looks like the difference here should be 
noise. Maybe we need some longer runs.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement


On 17/02/15 02:57, Andrew Dunstan wrote:


On 02/16/2015 08:48 PM, Petr Jelinek wrote:

On 17/02/15 01:57, Peter Geoghegan wrote:

On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com
wrote:

We definitely want this feature, I wished to have this info many times.


I would still like to see a benchmark.




Average of 3 runs of read-only pgbench on my system all with
pg_stat_statement activated:
HEAD:  20631
SQRT:  20533
SQRTD: 20592





So using sqrtd the cost is 0.18%. I think that's acceptable.



I think so too.

I found one more issue with the 1.2--1.3 upgrade script, the DROP 
FUNCTION pg_stat_statements(); should be DROP FUNCTION 
pg_stat_statements(bool); since in 1.2 the function identity has changed.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement




On Wed, 21 Jan 2015, Andrew Dunstan wrote:



On 01/21/2015 09:27 AM, Arne Scheffer wrote:

Sorry, corrected second try because of copypaste mistakes:
VlG-Arne


Comments appreciated.
Definition var_samp = Sum of squared differences /n-1
Definition stddev_samp = sqrt(var_samp)
Example N=4
1.) Sum of squared differences
   1_4Sum(Xi-XM4)²
=
2.) adding nothing
   1_4Sum(Xi-XM4)²
  +0
  +0
  +0
=
3.) nothing changed
  1_4Sum(Xi-XM4)²
  +(-1_3Sum(Xi-XM3)²+1_3Sum(Xi-XM3)²)
  +(-1_2Sum(Xi-XM2)²+1_2Sum(Xi-XM2)²)
  +(-1_1Sum(Xi-XM1)²+1_1Sum(Xi-XM1)²)
=
4.) parts reordered
   (1_4Sum(Xi-XM4)²-1_3Sum(Xi-XM3)²)
  +(1_3Sum(Xi-XM3)²-1_2Sum(Xi-XM2)²)
  +(1_2Sum(Xi-XM2)²-1_1Sum(Xi-XM1)²)
  +1_1Sum(X1-XM1)²
=
5.)
   (X4-XM4)(X4-XM3)
+ (X3-XM3)(X3-XM2)
+ (X2-XM2)(X2-XM1)
+ (X1-XM1)²
=
6.) XM1=X1 = There it is - The iteration part of Welfords Algorithm
(in
reverse order)
   (X4-XM4)(X4-XM3)
+ (X3-XM3)(X3-XM2)
+ (X2-XM2)(X2-X1)
+ 0
The missing piece is 4.) to 5.)
it's algebra, look at e.g.:
http://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/







I have no idea what you are saying here.


I'm sorry for that statistics stuff, 
my attempt was only to visualize in detail 
the mathematical reason for 
the iterating part of Welfords algorithm

being computing the current sum of squared differences in every step

- therefore it's in my opinion better to call the variable sum_of_squared_diffs
  (every statistician will be confused bei sum_of_variances,
   because:  sample variance = sum_of_squared_diffs / n-1,
   have a look at Mr. Cooks explanation)

- therefore deviding by n-1 is the unbiased estimator by definition.
  (have a look at Mr. Cooks explanation)

- therefore I suggested (as a minor nomenclature issue) to call the 
column/description
  stdev_samp (PostgreSQL-nomenclature) / sample_ to indicate that 
information.
  (have a look at the PostgreSQL aggregate functions, it's doing that the same 
way)



Here are comments in email to me from the author of 
http://www.johndcook.com/blog/standard_deviation regarding the divisor 
used:


  My code is using the unbiased form of the sample variance, dividing
  by n-1.



I am relieved, now we are at least two persons saying that. :-)
Insert into the commonly known definition


Definition stddev_samp = sqrt(var_samp)


from above, and it's exactly my point.

Maybe I should add that in the code comments. Otherwise, I don't think we 
need a change.


Huh?

Why is it a bad thing to call the column stddev_samp analog to the
aggregate function or make a note in the documentation, 
that the sample stddev is used to compute the solution?


I really think it not a good strategy having the user to make a test or dive
into the source code to determine the divisor used.

E.g. David expected stdev_pop, so there is a need for documentation for cases 
with a small sample.

VlG-Arne

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-01-21 Thread Andrew Dunstan



On 01/21/2015 11:21 AM, Arne Scheffer wrote:




Why is it a bad thing to call the column stddev_samp analog to the
aggregate function or make a note in the documentation, that the 
sample stddev is used to compute the solution?



I think you are making a mountain out of a molehill, frankly. These 
stats are not intended as anything other than a pretty indication of the 
shape, to see if they are significantly influenced by outliers. For any 
significantly large sample size the difference will be negligible.


But I will add a note to the documentation, that seems reasonable.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement



Andrew Dunstan schrieb am 2015-01-21:

 On 01/21/2015 11:21 AM, Arne Scheffer wrote:



 Why is it a bad thing to call the column stddev_samp analog to the
 aggregate function or make a note in the documentation, that the
 sample stddev is used to compute the solution?


 I think you are making a mountain out of a molehill, frankly. These
 stats are not intended as anything other than a pretty indication of
 the
 shape, to see if they are significantly influenced by outliers. For
 any
 significantly large sample size the difference will be negligible.

You're right, I maybe exaggerated the statistics part a bit.
I wanted to help, because the patch is of interest for us.
I will try to keep focus in the future.


 But I will add a note to the documentation, that seems reasonable.


*happy*

Thx

Arne



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

David G Johnston schrieb am 2015-01-21:
Andrew Dunstan wrote
On 01/20/2015 01:26 PM, Arne Scheffer wrote:

And a very minor aspect:
The term standard deviation in your code stands for
(corrected) sample standard deviation, I think,
because you devide by n-1 instead of n to keep the
estimator unbiased.
How about mentioning the prefix sample
to indicate this beiing the estimator?

I don't understand. I'm following pretty exactly the calculations
stated
at lt;http://www.johndcook.com/blog/standard_deviation/gt;

I'm not a statistician. Perhaps others who are more literate in
statistics can comment on this paragraph.

I'm largely in the same boat as Andrew but...

I take it that Arne is referring to:

http://en.wikipedia.org/wiki/Bessel's_correction

Yes, it is.

That's wrong, it's applied in the end to the sum of squared differences
and therefore per definition the corrected sample standard deviation
estimator.

In fact, as
far as
the database knows, the values provided to this function do represent
an
entire population and such a correction would be unnecessary. I

That would probably be an exotic assumption in a working database
and it is not, what is computed here!

guess it
boils down to whether future queries are considered part of the
population
or whether the population changes upon each query being run and thus
we are
calculating the ever-changing population variance.

Yes, indeed correct.
And exactly to avoid that misunderstanding, I suggested to
use the sample term.
To speak in Postgresql terms; applied in Andrews/Welfords algorithm
is stddev_samp(le), not stddev_pop(ulation).
Therefore stddev in Postgres is only kept for historical reasons, look at
http://www.postgresql.org/docs/9.4/static/functions-aggregate.html
Table 9-43.

VlG-Arne

Note point 3 in
the
linked Wikipedia article.

David J.

--
View this message in context:
http://postgresql.nabble.com/Add-min-and-max-execute-statement-time-in-pg-stat-statement-tp5774989p5834805.html
Sent from the PostgreSQL - hackers mailing list archive at
Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

 I don't understand. I'm following pretty exactly the calculations
 stated
 at lt;http://www.johndcook.com/blog/standard_deviation/gt;

 I'm not a statistician. Perhaps others who are more literate in

Maybe I'm mistaken here,
but I think, the algorithm is not that complicated.
I try to explain it further:

Comments appreciated.

Definition var_samp = Sum of squared differences /n-1
Definition stddev_samp = sqrt(var_samp)

Example N=4

1.) Sum of squared differences
  1_4Sum(Xi-XM4)²
=
2.) adding nothing
  1_4Sum(Xi-XM4)²
 +0
 +0
 +0
=
3.) nothing changed
 1_4Sum(Xi-XM4)²
 +(-1_3Sum(Xi-XM3)²+1_3Sum(Xi-XM3)²)
 +(-1_2Sum(Xi-XM2)²+1_2Sum(Xi-XM3)²)
 +(-1_1Sum(Xi-XM2)²+1_1Sum(Xi-XM3)²)

=
4.) parts reordered
  (1_4Sum(Xi-XM4)²-1_3Sum(Xi-XM3)²)
 +(1_3Sum(Xi-XM3)²-1_2Sum(Xi-XM2)²)
 +(1_2Sum(Xi-XM2)²-1_1Sum(Xi-XM2)²)
 +1_1Sum(X1-XM1)²
=
5.)
  (X4-XM4)(X4-XM3)
+ (X3-XM3)(X3-XM2)
+ (X2-XM2)(X2-XM1)
+ (X1-XM1)²
=
6.) XM1=X1 = There it is - The iteration part of Welfords Algorithm (in
reverse order)
  (X4-XM4)(X4-XM3)
+ (X3-XM3)(X3-XM2)
+ (X2-XM2)(X2-X1)
+ 0

The missing piece is 4.) to 5.)
it's algebra, look at e.g.:
http://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/

 Thanks. Still not quite sure what to do, though :-) I guess in the
 end we want the answer to come up with similar results to the builtin
 stddev SQL function. I'll try to set up a test program, to see if we do.

If you want to go this way:
Maybe this is one of the very few times, you have to use a small sample
;-)

VlG-Arne


 cheers

 andrew


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

Sorry, corrected second try because of copypaste mistakes:
VlG-Arne

 Comments appreciated.

 Definition var_samp = Sum of squared differences /n-1
 Definition stddev_samp = sqrt(var_samp)

 Example N=4

 1.) Sum of squared differences
   1_4Sum(Xi-XM4)²
 =
 2.) adding nothing
   1_4Sum(Xi-XM4)²
  +0
  +0
  +0
 =
 3.) nothing changed
  1_4Sum(Xi-XM4)²
  +(-1_3Sum(Xi-XM3)²+1_3Sum(Xi-XM3)²)
  +(-1_2Sum(Xi-XM2)²+1_2Sum(Xi-XM2)²)
  +(-1_1Sum(Xi-XM1)²+1_1Sum(Xi-XM1)²)

 =
 4.) parts reordered
   (1_4Sum(Xi-XM4)²-1_3Sum(Xi-XM3)²)
  +(1_3Sum(Xi-XM3)²-1_2Sum(Xi-XM2)²)
  +(1_2Sum(Xi-XM2)²-1_1Sum(Xi-XM1)²)
  +1_1Sum(X1-XM1)²
 =
 5.)
   (X4-XM4)(X4-XM3)
 + (X3-XM3)(X3-XM2)
 + (X2-XM2)(X2-XM1)
 + (X1-XM1)²
 =
 6.) XM1=X1 = There it is - The iteration part of Welfords Algorithm
 (in
 reverse order)
   (X4-XM4)(X4-XM3)
 + (X3-XM3)(X3-XM2)
 + (X2-XM2)(X2-X1)
 + 0

 The missing piece is 4.) to 5.)
 it's algebra, look at e.g.:
 http://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-01-21 Thread Andrew Dunstan



On 01/21/2015 09:27 AM, Arne Scheffer wrote:

Sorry, corrected second try because of copypaste mistakes:
VlG-Arne


Comments appreciated.
Definition var_samp = Sum of squared differences /n-1
Definition stddev_samp = sqrt(var_samp)
Example N=4
1.) Sum of squared differences
   1_4Sum(Xi-XM4)²
=
2.) adding nothing
   1_4Sum(Xi-XM4)²
  +0
  +0
  +0
=
3.) nothing changed
  1_4Sum(Xi-XM4)²
  +(-1_3Sum(Xi-XM3)²+1_3Sum(Xi-XM3)²)
  +(-1_2Sum(Xi-XM2)²+1_2Sum(Xi-XM2)²)
  +(-1_1Sum(Xi-XM1)²+1_1Sum(Xi-XM1)²)
=
4.) parts reordered
   (1_4Sum(Xi-XM4)²-1_3Sum(Xi-XM3)²)
  +(1_3Sum(Xi-XM3)²-1_2Sum(Xi-XM2)²)
  +(1_2Sum(Xi-XM2)²-1_1Sum(Xi-XM1)²)
  +1_1Sum(X1-XM1)²
=
5.)
   (X4-XM4)(X4-XM3)
+ (X3-XM3)(X3-XM2)
+ (X2-XM2)(X2-XM1)
+ (X1-XM1)²
=
6.) XM1=X1 = There it is - The iteration part of Welfords Algorithm
(in
reverse order)
   (X4-XM4)(X4-XM3)
+ (X3-XM3)(X3-XM2)
+ (X2-XM2)(X2-X1)
+ 0
The missing piece is 4.) to 5.)
it's algebra, look at e.g.:
http://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/







I have no idea what you are saying here.

Here are comments in email to me from the author of 
http://www.johndcook.com/blog/standard_deviation regarding the divisor 
used:


   My code is using the unbiased form of the sample variance, dividing
   by n-1.

   It's usually not worthwhile to make a distinction between a sample
   and a population because the population is often itself a sample.
   For example, if you could measure the height of everyone on earth at
   one instance, that's the entire population, but it's still a sample
   from all who have lived and who ever will live.

   Also, for large n, there's hardly any difference between 1/n and
   1/(n-1).


Maybe I should add that in the code comments. Otherwise, I don't think 
we need a change.



cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-01-20 Thread Andrew Dunstan



On 01/20/2015 01:26 PM, Arne Scheffer wrote:

Interesting patch.
I did a quick review looking only into the patch file.

The sum of variances variable contains
the sum of squared differences instead, I think.


Umm, no. It's not.

   e-counters.sum_var_time +=
   (total_time - old_mean) * (total_time - e-counters.mean_time);

This is not a square that's being added. old_mean is not the same as 
e-counters.mean_time.


Since the variance is this value divided by (n - 1), AIUI, I think sum 
of variances isn't a bad description. I'm open to alternative suggestions.





And a very minor aspect:
The term standard deviation in your code stands for
(corrected) sample standard deviation, I think,
because you devide by n-1 instead of n to keep the
estimator unbiased.
How about mentioning the prefix sample
to indicate this beiing the estimator?



I don't understand. I'm following pretty exactly the calculations stated 
at http://www.johndcook.com/blog/standard_deviation/



I'm not a statistician. Perhaps others who are more literate in 
statistics can comment on this paragraph.





And I'm sure I'm missing C specifics (again)
(or it's the reduced patch file scope),
but you introduce sqrtd, but sqrt is called?



Good catch. Will fix.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-01-20 Thread Andrew Dunstan



On 01/20/2015 06:32 PM, David G Johnston wrote:

Andrew Dunstan wrote

On 01/20/2015 01:26 PM, Arne Scheffer wrote:

And a very minor aspect:
The term standard deviation in your code stands for
(corrected) sample standard deviation, I think,
because you devide by n-1 instead of n to keep the
estimator unbiased.
How about mentioning the prefix sample
to indicate this beiing the estimator?


I don't understand. I'm following pretty exactly the calculations stated
at lt;http://www.johndcook.com/blog/standard_deviation/gt;


I'm not a statistician. Perhaps others who are more literate in
statistics can comment on this paragraph.

I'm largely in the same boat as Andrew but...

I take it that Arne is referring to:

http://en.wikipedia.org/wiki/Bessel's_correction

but the mere presence of an (n-1) divisor does not mean that is what is
happening.  In this particular situation I believe the (n-1) simply is a
necessary part of the recurrence formula and not any attempt to correct for
sampling bias when estimating a population's variance.  In fact, as far as
the database knows, the values provided to this function do represent an
entire population and such a correction would be unnecessary.  I guess it
boils down to whether future queries are considered part of the population
or whether the population changes upon each query being run and thus we are
calculating the ever-changing population variance.  Note point 3 in the
linked Wikipedia article.





Thanks. Still not quite sure what to do, though :-) I guess in the end 
we want the answer to come up with similar results to the builtin stddev 
SQL function. I'll try to set up a test program, to see if we do.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-01-20 Thread David G Johnston

Andrew Dunstan wrote
On 01/20/2015 01:26 PM, Arne Scheffer wrote:

I don't understand. I'm following pretty exactly the calculations stated
at lt;http://www.johndcook.com/blog/standard_deviation/gt;

I'm not a statistician. Perhaps others who are more literate in
statistics can comment on this paragraph.

I'm largely in the same boat as Andrew but...

I take it that Arne is referring to:

http://en.wikipedia.org/wiki/Bessel's_correction

but the mere presence of an (n-1) divisor does not mean that is what is
happening. In this particular situation I believe the (n-1) simply is a
necessary part of the recurrence formula and not any attempt to correct for
sampling bias when estimating a population's variance. In fact, as far as
the database knows, the values provided to this function do represent an
entire population and such a correction would be unnecessary. I guess it
boils down to whether future queries are considered part of the population
or whether the population changes upon each query being run and thus we are
calculating the ever-changing population variance. Note point 3 in the
linked Wikipedia article.

David J.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-01-20 Thread Arne Scheffer



Andrew Dunstan schrieb am 2015-01-20:

 On 01/20/2015 01:26 PM, Arne Scheffer wrote:
 Interesting patch.
 I did a quick review looking only into the patch file.

 The sum of variances variable contains
 the sum of squared differences instead, I think.

 Umm, no. It's not.

Umm, yes, i think, it is ;-)

   e-counters.sum_var_time +=
   (total_time - old_mean) * (total_time - e-counters.mean_time);
 This is not a square that's being added.

That's correct.
Nevertheless it's the difference between the computed sum of squared
differences
and the preceeding one, added in every step.

 old_mean is not the same as
 e-counters.mean_time.

 Since the variance is this value divided by (n - 1), AIUI, I think
 sum
 of variances isn't a bad description. I'm open to alternative
 suggestions.

 And a very minor aspect:
 The term standard deviation in your code stands for
 (corrected) sample standard deviation, I think,
 because you devide by n-1 instead of n to keep the
 estimator unbiased.
 How about mentioning the prefix sample
 to indicate this beiing the estimator?

 I don't understand. I'm following pretty exactly the calculations
 stated
 at http://www.johndcook.com/blog/standard_deviation/

(There is nothing bad about that calculations, Welford's algorithm
 is simply sequently adding the differences mentioned above.)

VlG-Arne


 I'm not a statistician. Perhaps others who are more literate in
 statistics can comment on this paragraph.

 And I'm sure I'm missing C specifics (again)
 (or it's the reduced patch file scope),
 but you introduce sqrtd, but sqrt is called?


 Good catch. Will fix.

 cheers

 andrew


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-01-20 Thread Arne Scheffer

Interesting patch.
I did a quick review looking only into the patch file.

The sum of variances variable contains
the sum of squared differences instead, I think.

And a very minor aspect:
The term standard deviation in your code stands for
(corrected) sample standard deviation, I think,
because you devide by n-1 instead of n to keep the
estimator unbiased.
How about mentioning the prefix sample
to indicate this beiing the estimator?

And I'm sure I'm missing C specifics (again)
(or it's the reduced patch file scope),
but you introduce sqrtd, but sqrt is called?

VlG

Arne












-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-01-06 Thread Andrew Dunstan



On 12/21/2014 02:50 PM, Andrew Dunstan wrote:


On 12/21/2014 02:12 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

On 12/21/2014 01:23 PM, Alvaro Herrera wrote:
The point, I think, is that without atomic instructions you have to 
hold

a lock while incrementing the counters.

Hmm, do we do that now?

We already have a spinlock mutex around the counter adjustment code, so
I'm not sure why this discussion is being held.


Because Peter suggested we might be able to use atomics. I'm a bit 
dubious that we can for min and max anyway.





I would like someone more versed in numerical analysis than me to
tell me how safe using sum of squares actually is in our case.

That, on the other hand, might be a real issue.  I'm afraid that
accumulating across a very long series of statements could lead
to severe roundoff error in the reported values, unless we use
a method chosen for numerical stability.





Right.

The next question along those lines is whether we need to keep a 
running mean or whether that can safely be calculated on the fly. The 
code at http://www.johndcook.com/blog/standard_deviation/ does keep 
a running mean, and maybe that's required to prevent ill conditioned 
results, although I'm quite sure I see how it would. But this isn't my 
area of expertise.





I played it safe and kept a running mean, and since it's there and 
useful in itself I exposed it via the function, so there are four new 
columns, min_time, max_time, mean_time and stddev_time.


I'll add this to the upcoming commitfest.

cheers

andrew
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 2709909..975a637 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -4,8 +4,9 @@ MODULE_big = pg_stat_statements
 OBJS = pg_stat_statements.o $(WIN32RES)
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.2.sql pg_stat_statements--1.1--1.2.sql \
-	pg_stat_statements--1.0--1.1.sql pg_stat_statements--unpackaged--1.0.sql
+DATA = pg_stat_statements--1.3.sql pg_stat_statements--1.2--1.3.sql \
+	pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.0--1.1.sql \
+	pg_stat_statements--unpackaged--1.0.sql
 PGFILEDESC = pg_stat_statements - execution statistics of SQL statements
 
 ifdef USE_PGXS
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.2-1.3.sql b/contrib/pg_stat_statements/pg_stat_statements--1.2-1.3.sql
new file mode 100644
index 000..506959d
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.2-1.3.sql
@@ -0,0 +1,47 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use ALTER EXTENSION pg_stat_statements UPDATE TO '1.3' to load this file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements();
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements();
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(IN showtext boolean,
+OUT userid oid,
+OUT dbid oid,
+OUT queryid bigint,
+OUT query text,
+OUT calls int8,
+OUT total_time float8,
+OUT min_time float8,
+OUT max_time float8,
+OUT mean_time float8,
+OUT stddev_time float8,
+OUT rows int8,
+OUT shared_blks_hit int8,
+OUT shared_blks_read int8,
+OUT shared_blks_dirtied int8,
+OUT shared_blks_written int8,
+OUT local_blks_hit int8,
+OUT local_blks_read int8,
+OUT local_blks_dirtied int8,
+OUT local_blks_written int8,
+OUT temp_blks_read int8,
+OUT temp_blks_written int8,
+OUT blk_read_time float8,
+OUT blk_write_time float8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_stat_statements_1_3'
+LANGUAGE C STRICT VOLATILE;
+
+CREATE VIEW pg_stat_statements AS
+  SELECT * FROM pg_stat_statements(true);
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
deleted file mode 100644
index 5bfa9a5..000
--- a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
+++ /dev/null
@@ -1,44 +0,0 @@
-/* contrib/pg_stat_statements/pg_stat_statements--1.2.sql */
-
--- complain if script is sourced in psql, rather than via CREATE EXTENSION
-\echo Use CREATE EXTENSION pg_stat_statements to load this file. \quit
-
--- Register functions.
-CREATE FUNCTION pg_stat_statements_reset()
-RETURNS void
-AS 'MODULE_PATHNAME'
-LANGUAGE C;
-
-CREATE FUNCTION pg_stat_statements(IN showtext boolean,
-OUT userid oid,
-OUT dbid oid,
-OUT queryid bigint,
-OUT query text,
-OUT calls int8,
-OUT total_time float8,
-OUT rows int8,
-OUT shared_blks_hit int8,
-OUT shared_blks_read int8,
-OUT shared_blks_dirtied int8,
-OUT

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-12-21 Thread Andrew Dunstan



On 04/07/2014 04:19 PM, Tom Lane wrote:

Alvaro Herrera alvhe...@2ndquadrant.com writes:

I just noticed that this patch not only adds min,max,stddev, but it also
adds the ability to reset an entry's counters.  This hasn't been
mentioned in this thread at all; there has been no discussion on whether
this is something we want to have, nor on whether this is the right API.
What it does is add a new function pg_stat_statements_reset_time() which
resets the min and max values from all function's entries, without
touching the stddev state variables nor the number of calls.  Note
there's no ability to reset the values for a single function.

That seems pretty bizarre.  At this late date, the best thing would
probably be to strip out the undiscussed functionality.  It can get
resubmitted for 9.5 if there's a real use-case for it.





This seems to have fallen through the slats completely. I'd like to 
revive it for 9.5, without the extra function. If someone wants to be 
able to reset stats on a finer grained basis that should probably be a 
separate patch.


One thing that bothered me slightly is that the stddev calculation 
appears to use a rather naive sum of squares method of calculation, 
which is known to give ill conditioned or impossible results under some 
pathological conditions: see 
http://www.johndcook.com/blog/standard_deviation for some details. I 
don't know if our results are likely to be so skewed as to produce 
pathological results, but ISTM we should probably take the trouble to be 
correct and use Welford's method anyway.


On my blog Peter Geoghegan mentioned something about atomic 
fetch-and-add being useful here, but I'm not quite sure what that's 
referring to. Perhaps someone can give me a pointer.


Thoughts?

cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-12-21 Thread Alvaro Herrera

Andrew Dunstan wrote:

 On my blog Peter Geoghegan mentioned something about atomic fetch-and-add
 being useful here, but I'm not quite sure what that's referring to. Perhaps
 someone can give me a pointer.

The point, I think, is that without atomic instructions you have to hold
a lock while incrementing the counters.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-12-21 Thread Andres Freund

On December 21, 2014 7:23:27 PM CET, Alvaro Herrera alvhe...@2ndquadrant.com 
wrote:
Andrew Dunstan wrote:

 On my blog Peter Geoghegan mentioned something about atomic
fetch-and-add
 being useful here, but I'm not quite sure what that's referring to.
Perhaps
 someone can give me a pointer.

The point, I think, is that without atomic instructions you have to
hold
a lock while incrementing the counters.

Port/atomics.h provides a abstraction for doing such atomic manipulations.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-12-21 Thread Andrew Dunstan



On 12/21/2014 01:23 PM, Alvaro Herrera wrote:

Andrew Dunstan wrote:


On my blog Peter Geoghegan mentioned something about atomic fetch-and-add
being useful here, but I'm not quite sure what that's referring to. Perhaps
someone can give me a pointer.

The point, I think, is that without atomic instructions you have to hold
a lock while incrementing the counters.



Hmm, do we do that now? That won't work for the stddev method I was 
referring to, although it could for the sum of squares method. In that 
case I would like someone more versed in numerical analysis than me to 
tell me how safe using sum of squares actually is in our case. And how 
about min and max? They don't look like good candidates for atomic 
operations.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-12-21 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 On 12/21/2014 01:23 PM, Alvaro Herrera wrote:
 The point, I think, is that without atomic instructions you have to hold
 a lock while incrementing the counters.

 Hmm, do we do that now?

We already have a spinlock mutex around the counter adjustment code, so
I'm not sure why this discussion is being held.

 I would like someone more versed in numerical analysis than me to 
 tell me how safe using sum of squares actually is in our case.

That, on the other hand, might be a real issue.  I'm afraid that
accumulating across a very long series of statements could lead
to severe roundoff error in the reported values, unless we use
a method chosen for numerical stability.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-12-21 Thread Andrew Dunstan



On 12/21/2014 02:12 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

On 12/21/2014 01:23 PM, Alvaro Herrera wrote:

The point, I think, is that without atomic instructions you have to hold
a lock while incrementing the counters.

Hmm, do we do that now?

We already have a spinlock mutex around the counter adjustment code, so
I'm not sure why this discussion is being held.


Because Peter suggested we might be able to use atomics. I'm a bit 
dubious that we can for min and max anyway.





I would like someone more versed in numerical analysis than me to
tell me how safe using sum of squares actually is in our case.

That, on the other hand, might be a real issue.  I'm afraid that
accumulating across a very long series of statements could lead
to severe roundoff error in the reported values, unless we use
a method chosen for numerical stability.





Right.

The next question along those lines is whether we need to keep a running 
mean or whether that can safely be calculated on the fly. The code at 
http://www.johndcook.com/blog/standard_deviation/ does keep a running 
mean, and maybe that's required to prevent ill conditioned results, 
although I'm quite sure I see how it would. But this isn't my area of 
expertise.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-04-07 Thread Alvaro Herrera

KONDO Mitsumasa wrote:
 Hi all,
 
 I think this patch is completely forgotten, and feel very unfortunate:(
 
 Min, max, and stdev is basic statistics in general monitoring tools,
 So I'd like to push it.

I just noticed that this patch not only adds min,max,stddev, but it also
adds the ability to reset an entry's counters.  This hasn't been
mentioned in this thread at all; there has been no discussion on whether
this is something we want to have, nor on whether this is the right API.

What it does is add a new function pg_stat_statements_reset_time() which
resets the min and max values from all function's entries, without
touching the stddev state variables nor the number of calls.  Note
there's no ability to reset the values for a single function.  

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-04-07 Thread Tom Lane

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 I just noticed that this patch not only adds min,max,stddev, but it also
 adds the ability to reset an entry's counters.  This hasn't been
 mentioned in this thread at all; there has been no discussion on whether
 this is something we want to have, nor on whether this is the right API.

 What it does is add a new function pg_stat_statements_reset_time() which
 resets the min and max values from all function's entries, without
 touching the stddev state variables nor the number of calls.  Note
 there's no ability to reset the values for a single function.  

That seems pretty bizarre.  At this late date, the best thing would
probably be to strip out the undiscussed functionality.  It can get
resubmitted for 9.5 if there's a real use-case for it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-04-07 Thread Robert Haas

On Mon, Apr 7, 2014 at 4:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 I just noticed that this patch not only adds min,max,stddev, but it also
 adds the ability to reset an entry's counters.  This hasn't been
 mentioned in this thread at all; there has been no discussion on whether
 this is something we want to have, nor on whether this is the right API.

 What it does is add a new function pg_stat_statements_reset_time() which
 resets the min and max values from all function's entries, without
 touching the stddev state variables nor the number of calls.  Note
 there's no ability to reset the values for a single function.

 That seems pretty bizarre.  At this late date, the best thing would
 probably be to strip out the undiscussed functionality.  It can get
 resubmitted for 9.5 if there's a real use-case for it.

If I'm understanding correctly, that actually seems reasonably
sensible.  I mean, the big problem with min/max is that you might be
taking averages and standard deviations over a period of months, but
the maximum and minimum are not that meaningful over such a long
period.  You might well want to keep a longer-term average while
seeing the maximum and minimum for, say, today.  I don't know exactly
how useful that is, but it doesn't seem obviously dumb.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-04-07 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, Apr 7, 2014 at 4:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 What it does is add a new function pg_stat_statements_reset_time() which
 resets the min and max values from all function's entries, without
 touching the stddev state variables nor the number of calls.  Note
 there's no ability to reset the values for a single function.

 That seems pretty bizarre.  At this late date, the best thing would
 probably be to strip out the undiscussed functionality.  It can get
 resubmitted for 9.5 if there's a real use-case for it.

 If I'm understanding correctly, that actually seems reasonably
 sensible.  I mean, the big problem with min/max is that you might be
 taking averages and standard deviations over a period of months, but
 the maximum and minimum are not that meaningful over such a long
 period.  You might well want to keep a longer-term average while
 seeing the maximum and minimum for, say, today.  I don't know exactly
 how useful that is, but it doesn't seem obviously dumb.

Well, maybe it's okay, but not being able to reset the stddev state seems
odd, and the chosen function name seems odder.  In any case, the time to
be discussing the design of such functionality was several months ago.
I'm still for ripping it out for 9.4, rather than being stuck with
behavior that's not been adequately discussed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-03-04 Thread KONDO Mitsumasa


Hi all,

I think this patch is completely forgotten, and feel very unfortunate:(

Min, max, and stdev is basic statistics in general monitoring tools,
So I'd like to push it.

(2014/02/12 15:45), KONDO Mitsumasa wrote:

(2014/01/29 17:31), Rajeev rastogi wrote:

No Issue, you can share me the test cases, I will take the performance report.

Attached patch is supported to latest pg_stat_statements. It includes min, max,
and stdev statistics. Could you run compiling test on your windows enviroments?
I think compiling error was fixed.

We had disscuttion about which is needed useful statistics in community, I think
both of statistics have storong and weak point. When we see the less(2 or 3)
executed statement, stdev will be meaningless because it cannot calculate
estimated value precisely very much, however in this situation, min and max will
be propety work well because it isn't estimated value but fact value. On the
other hand, when we see the more frequency executed statement, they will be
contrary position
statistics, stdev will be very useful statistics for estimating whole 
statements,
and min and max might be extremely value.
At the end of the day, these value were needed each other for more useful
statistics when we want to see several actual statments. And past my experience
showed no performance problems in this patch. So I'd like to implements all 
these
values in pg_stat_statements.


Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-02-18 Thread KONDO Mitsumasa


(2014/02/17 21:44), Rajeev rastogi wrote:

It got compiled successfully on Windows.

Thank you for checking on Windows! It is very helpful for me.

Can we allow to add three statistics? I think only adding stdev is difficult to
image for user. But if there are min and max, we can image each statements 
situations more easily. And I don't want to manage this feature in my monitoring 
tool that is called pg_statsinfo. Because it is beneficial for a lot of 
pg_stat_statements user and for improvement of postgres performance in the future.


Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-02-17 Thread Rajeev rastogi

On 12 February 2014 12:16, KONDO Mitsumasa Wrote:
 
 Hi Rajeev,
 
  (2014/01/29 17:31), Rajeev rastogi wrote:
  No Issue, you can share me the test cases, I will take the
 performance report.
 Attached patch is supported to latest pg_stat_statements. It includes
 min, max, and stdev statistics. Could you run compiling test on your
 windows enviroments?
 I think compiling error was fixed.

It got compiled successfully on Windows.

Thanks and Regards,
Kumar Rajeev Rastogi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-02-11 Thread KONDO Mitsumasa


Hi Rajeev,


(2014/01/29 17:31), Rajeev rastogi wrote:

No Issue, you can share me the test cases, I will take the performance report.

Attached patch is supported to latest pg_stat_statements. It includes min, max,
and stdev statistics. Could you run compiling test on your windows enviroments?
I think compiling error was fixed.

We had disscuttion about which is needed useful statistics in community, I think 
both of statistics have storong and weak point. When we see the less(2 or 3) 
executed statement, stdev will be meaningless because it cannot calculate 
estimated value precisely very much, however in this situation, min and max will 
be propety work well because it isn't estimated value but fact value. On the 
other hand, when we see the more frequency executed statement, they will be 
contrary position
statistics, stdev will be very useful statistics for estimating whole statements, 
and min and max might be extremely value.

At the end of the day, these value were needed each other for more useful
statistics when we want to see several actual statments. And past my experience 
showed no performance problems in this patch. So I'd like to implements all these 
values in pg_stat_statements.


Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center





*** a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
***
*** 19,24  CREATE FUNCTION pg_stat_statements(IN showtext boolean,
--- 19,27 
  OUT query text,
  OUT calls int8,
  OUT total_time float8,
+ OUT min_time float8,
+ OUT max_time float8,
+ OUT stdev_time float8,
  OUT rows int8,
  OUT shared_blks_hit int8,
  OUT shared_blks_read int8,
***
*** 41,43  CREATE VIEW pg_stat_statements AS
--- 44,51 
SELECT * FROM pg_stat_statements(true);
  
  GRANT SELECT ON pg_stat_statements TO PUBLIC;
+ 
+ CREATE FUNCTION pg_stat_statements_reset_time()
+ RETURNS void
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
*** a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
***
*** 9,14  RETURNS void
--- 9,19 
  AS 'MODULE_PATHNAME'
  LANGUAGE C;
  
+ CREATE FUNCTION pg_stat_statements_reset_time()
+ RETURNS void
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
  CREATE FUNCTION pg_stat_statements(IN showtext boolean,
  OUT userid oid,
  OUT dbid oid,
***
*** 16,21  CREATE FUNCTION pg_stat_statements(IN showtext boolean,
--- 21,29 
  OUT query text,
  OUT calls int8,
  OUT total_time float8,
+ OUT min_time float8,
+ OUT max_time float8,
+ OUT stdev_time float8,
  OUT rows int8,
  OUT shared_blks_hit int8,
  OUT shared_blks_read int8,
***
*** 42,44  GRANT SELECT ON pg_stat_statements TO PUBLIC;
--- 50,53 
  
  -- Don't want this to be available to non-superusers.
  REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC;
+ REVOKE ALL ON FUNCTION pg_stat_statements_reset_time() FROM PUBLIC;
*** a/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql
***
*** 4,8 
--- 4,9 
  \echo Use CREATE EXTENSION pg_stat_statements to load this file. \quit
  
  ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset();
+ ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset_time();
  ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements();
  ALTER EXTENSION pg_stat_statements ADD view pg_stat_statements;
*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***
*** 106,111  static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
--- 106,113 
  #define USAGE_DECREASE_FACTOR	(0.99)	/* decreased every entry_dealloc */
  #define STICKY_DECREASE_FACTOR	(0.50)	/* factor for sticky entries */
  #define USAGE_DEALLOC_PERCENT	5		/* free this % of entries at once */
+ #define EXEC_TIME_INIT_MIN	DBL_MAX		/* initial execution min time */
+ #define EXEC_TIME_INIT_MAX	-DBL_MAX	/* initial execution max time */
  
  #define JUMBLE_SIZE1024	/* query serialization buffer size */
  
***
*** 137,142  typedef struct Counters
--- 139,147 
  {
  	int64		calls;			/* # of times executed */
  	double		total_time;		/* total execution time, in msec */
+ 	double		total_sqtime;		/* cumulated square execution time, in msec */
+ 	double		min_time;		/* maximum execution time, in msec */
+ 	double		max_time;		/* minimum execution time, in msec */
  	int64		rows;			/* total # of retrieved or affected rows */
  	int64		shared_blks_hit;	/* # of shared buffer hits */
  	int64		shared_blks_read;		/* # of shared disk blocks read */
***
*** 274,283  void		_PG_init(void);
--- 279,290 
  void		_PG_fini(void);
  
  Datum

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-31 Thread Mitsumasa KONDO

2014-01-31 Peter Geoghegan p...@heroku.com

 On Thu, Jan 30, 2014 at 12:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  In reality, actual applications
  could hardly be further from the perfectly uniform distribution of
  distinct queries presented here.
 
  Yeah, I made the same point in different words.  I think any realistic
  comparison of this code to what we had before needs to measure a workload
  with a more plausible query frequency distribution.

 Even though that distribution just doesn't square with anybody's
 reality, you can still increase the pg_stat_statements.max setting to
 10k and the problem goes away at little cost (a lower setting is
 better, but a setting high enough to cache everything is best). But
 you're not going to have terribly much use for pg_stat_statements
 anywayif you really do experience churn at that rate with 5,000
 possible entries, the module is ipso facto useless, and should be
 disabled.

I run extra test your and my patch with the pg_stat_statements.max
setting=10k
in other same setting and servers. They are faster than past results.

method |  try1   |  try2   |   try3

peter 3  | 6.769 | 6.784 | 6.785
method 5  | 6.770 | 6.774 | 6.761


I think that most significant overhead in pg_stat_statements is deleting
and inserting cost in hash table update, and not at LWLocks. If LWLock
is the most overhead, we can see the overhead -S pgbench, because it have
one select pet tern which are most Lock conflict case. But we can't see
such result.
I'm not sure about dynahash.c, but we can see hash conflict case in this
code.
IMHO, I think It might heavy because it have to run list search and compare
one
until not conflict it.

And past result shows that your patch's most weak point is that deleting
most old statement
and inserting new old statement cost is very high, as you know. It
accelerate to affect
update(delete and insert) cost in pg_stat_statements table. So you proposed
new setting
10k in default max value. But it is not essential solution, because it is
also good perfomance
 for old pg_stat_statements. And when we set max=10K in your patch and want
to get most
used only 1000 queries in pg_stat_statements, we have to use order-by-query
with limit 1000.
Sort cost is relatively high, so monitoring query will be slow and high
cost. But old one is only set
pg_stat_statements.max=1000, and performance is not relatively bad. It will
be best settings for getting
most used 1000 queries infomation.


That' all my assumption.

Sorry for a few extra test, I had no time in my office today.
If we hope, I can run 1/N distribution pgbench test next week,  I modify my
perl script little bit,
for creating multiple sql files with various sleep time.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-31 Thread Peter Geoghegan

On Fri, Jan 31, 2014 at 5:07 AM, Mitsumasa KONDO
kondo.mitsum...@gmail.com wrote:
 And past result shows that your patch's most weak point is that deleting
 most old statement
 and inserting new old statement cost is very high, as you know.

No, there is no reason to imagine that entry_dealloc() is any slower,
really. There will perhaps be some additional contention with shared
lockers, but that isn't likely to be a major aspect. When the hash
table is full, in reality at that point it's very unlikely that there
will be two simultaneous sessions that need to create a new entry. As
I said, on many of the systems I work with it takes weeks to see a new
entry. This will be especially true now that the *.max default is
5,000.

 It accelerate to affect
 update(delete and insert) cost in pg_stat_statements table. So you proposed
 new setting
 10k in default max value. But it is not essential solution, because it is
 also good perfomance
  for old pg_stat_statements.

I was merely pointing out that that would totally change the outcome
of your very artificial test-case. Decreasing the number of statements
to 5,000 would too. I don't think we should assign much weight to any
test case where the large majority or all statistics are wrong
afterwards, due to there being so much churn.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-31 Thread Tom Lane

Peter Geoghegan p...@heroku.com writes:
 On Fri, Jan 31, 2014 at 5:07 AM, Mitsumasa KONDO
 kondo.mitsum...@gmail.com wrote:
 It accelerate to affect
 update(delete and insert) cost in pg_stat_statements table. So you proposed
 new setting
 10k in default max value. But it is not essential solution, because it is
 also good perfomance
 for old pg_stat_statements.

 I was merely pointing out that that would totally change the outcome
 of your very artificial test-case. Decreasing the number of statements
 to 5,000 would too. I don't think we should assign much weight to any
 test case where the large majority or all statistics are wrong
 afterwards, due to there being so much churn.

To expand a bit:

(1) It's really not very interesting to consider pg_stat_statement's
behavior when the table size is too small to avoid 100% turnover;
that's not a regime where the module will provide useful functionality.

(2) It's completely unfair to pretend that a given hashtable size has the
same cost in old and new code; there's more than a 7X difference in the
shared-memory space eaten per hashtable entry.  That does have an impact
on whether people can set the table size large enough to avoid churn.

The theory behind this patch is that for the same shared-memory
consumption you can make the table size enough larger to move the cache
hit rate up significantly, and that that will more than compensate
performance-wise for the increased time needed to make a new table entry.
Now that theory is capable of being disproven, but an artificial example
with a flat query frequency distribution isn't an interesting test case.
We don't care about optimizing performance for such cases, because they
have nothing to do with real-world usage.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-30 Thread KONDO Mitsumasa

(2014/01/30 8:29), Tom Lane wrote:
 Andrew Dunstan and...@dunslane.net writes:
 I could live with just stddev. Not sure others would be so happy.
 
 FWIW, I'd vote for just stddev, on the basis that min/max appear to add
 more to the counter update time than stddev does; you've got
 this:
 
 + e-counters.total_sqtime += total_time * total_time;
 
 versus this:

 + if (e-counters.min_time  total_time || e-counters.min_time 
 == EXEC_TIME_INIT)
 + e-counters.min_time = total_time;
 + if (e-counters.max_time  total_time)
 + e-counters.max_time = total_time;
 
 I think on most modern machines, a float multiply-and-add is pretty
 darn cheap; a branch that might or might not be taken, OTOH, is a
 performance bottleneck.
 
 Similarly, the shared memory footprint hit is more: two new doubles
 for min/max versus one for total_sqtime (assuming we're happy with
 the naive stddev calculation).
 
 If we felt that min/max were of similar value to stddev then this
 would be mere nitpicking.  But since people seem to agree they're
 worth less, I'm thinking the cost/benefit ratio isn't there.
Why do you surplus worried about cost in my patch? Were three double variables
assumed a lot of memory, or having lot of calculating cost? My test result
showed LWlock bottele-neck is urban legend. If you have more fair test
pattern, please tell me, I'll do it if the community will do fair judge.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-30 Thread Peter Geoghegan

On Wed, Jan 29, 2014 at 8:48 PM, KONDO Mitsumasa
kondo.mitsum...@lab.ntt.co.jp wrote:
 I'd like to know the truth and the fact in your patch, rather than your
 argument:-)

I see.


 [part of sample sqls file, they are executed in pgbench]
 SELECT
 1-1/9+5*8*6+5+9-6-3-4/9%7-2%7/5-9/7+2+9/7-1%5%9/3-4/4-9/1+5+5/6/5%2+1*2*3-8/8+5-3-8-4/8+5%2*2-2-5%6+4
 SELECT
 1%9%7-8/5%6-1%2*2-7+9*6-2*6-9+1-2*9+6+7*8-4-2*1%3/7-1%4%9+4+7/5+4/2-3-5%8/3/7*6-1%8*6*1/7/2%5*6/2-3-9
 SELECT
 1%3*2/8%6%5-3%1+1/6*6*5/9-2*5+6/6*5-1/2-6%4%8/7%2*7%5%9%5/9%1%1-3-9/2*1+1*6%8/2*4/1+6*7*1+5%6+9*1-9*6

 [methods]
 method 1: with old pgss, pg_stat_statements.max=1000(default)
 method 2: with old pgss with my patch, pg_stat_statements.max=1000(default)
 peter 1 : with new pgss(Peter's patch), pg_stat_statements.max=5000(default)
 peter 2 : with new pgss(Peter's patch), pg_stat_statements.max=1000

 [for reference]
 method 5:with old pgss, pg_stat_statements.max=5000
 method 6:with old pgss with my patch, pg_stat_statements.max=5000

 [results]
  method   |  try1  |  try2  |  try3  | degrade performance ratio
 -
  method 1 | 6.546  | 6.558  | 6.638  | 0% (reference score)
  method 2 | 6.527  | 6.556  | 6.574  | 1%
  peter 1  | 5.204  | 5.203  | 5.216  |20%
  peter 2  | 4.241  | 4.236  | 4.262  |35%

  method 5 | 5.931  | 5.848  | 5.872  |11%
  method 6 | 5.794  | 5.792  | 5.776  |12%

So, if we compare the old pg_stat_statements and old default with the
new pg_stat_statements and the new default (why not? The latter still
uses less shared memory than the former), by the standard of this
benchmark there is a regression of about 20%. But you also note that
there is an 11% regression in the old pg_stat_statements against
*itself* when used with a higher pg_stat_statements.max setting of
5,000. This is completely contrary to everyone's prior understanding
that increasing the size of the hash table had a very *positive*
effect on performance by relieving cache pressure and thereby causing
less exclusive locking for an entry_dealloc().

Do you suppose that this very surprising result might just be because
this isn't a very representative benchmark? Nothing ever has the
benefit of *not* having to exclusive lock.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp writes:
 (2014/01/30 8:29), Tom Lane wrote:
 If we felt that min/max were of similar value to stddev then this
 would be mere nitpicking.  But since people seem to agree they're
 worth less, I'm thinking the cost/benefit ratio isn't there.

 Why do you surplus worried about cost in my patch? Were three double variables
 assumed a lot of memory, or having lot of calculating cost? My test result
 showed LWlock bottele-neck is urban legend. If you have more fair test
 pattern, please tell me, I'll do it if the community will do fair judge.

[ shrug... ]  Standard pgbench tests are useless for measuring microscopic
issues like this one; whatever incremental cost is there will be swamped
by client communication and parse/plan overhead.  You may have proven that
the overhead isn't 10% of round-trip query time, but we knew that without
testing.

If I were trying to measure the costs of these changes, I'd use short
statements executed inside loops in a plpgsql function.  And I'd
definitely do it on a machine with quite a few cores (and a running
session on each one), so that spinlock contention might conceivably
happen often enough to be measurable.

If that still says that the runtime cost is down in the noise, we
could then proceed to debate whether min/max are worth a 10% increase
in the size of the shared pgss hash table.  Because by my count,
that's about what two more doubles would be, now that we removed the
query texts from the hash table --- a table entry is currently 21
doublewords (at least on 64-bit machines) and we're debating increasing
it to 22 or 24.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

Peter Geoghegan p...@heroku.com writes:
 On Wed, Jan 29, 2014 at 8:48 PM, KONDO Mitsumasa
 kondo.mitsum...@lab.ntt.co.jp wrote:
 method   |  try1  |  try2  |  try3  | degrade performance ratio
 -
 method 1 | 6.546  | 6.558  | 6.638  | 0% (reference score)
 method 2 | 6.527  | 6.556  | 6.574  | 1%
 peter 1  | 5.204  | 5.203  | 5.216  |20%
 peter 2  | 4.241  | 4.236  | 4.262  |35%
 
 method 5 | 5.931  | 5.848  | 5.872  |11%
 method 6 | 5.794  | 5.792  | 5.776  |12%

 So, if we compare the old pg_stat_statements and old default with the
 new pg_stat_statements and the new default (why not? The latter still
 uses less shared memory than the former), by the standard of this
 benchmark there is a regression of about 20%. But you also note that
 there is an 11% regression in the old pg_stat_statements against
 *itself* when used with a higher pg_stat_statements.max setting of
 5,000. This is completely contrary to everyone's prior understanding
 that increasing the size of the hash table had a very *positive*
 effect on performance by relieving cache pressure and thereby causing
 less exclusive locking for an entry_dealloc().

 Do you suppose that this very surprising result might just be because
 this isn't a very representative benchmark? Nothing ever has the
 benefit of *not* having to exclusive lock.

If I understand this test scenario properly, there are no duplicate
queries, so that each iteration creates a new hashtable entry (possibly
evicting an old one).  And it's a single-threaded test, so that there
can be no benefit from reduced locking.

I don't find the results particularly surprising given that.  We knew
going in that the external-texts patch would slow down creation of a new
hashtable entry: there's no way that writing a query text to a file isn't
slower than memcpy'ing it into shared memory.  The performance argument
for doing it anyway is that by greatly reducing the size of hashtable
entries, it becomes more feasible to size the hashtable large enough so
that it's seldom necessary to evict hashtable entries.  It's always going
to be possible to make the patch look bad by testing cases where no
savings in hashtable evictions is possible; which is exactly what this
test case does.  But I don't think that's relevant to real-world usage.
pg_stat_statements isn't going to be useful unless there's a great deal of
repeated statement execution, so what we should be worrying about is the
case where a table entry exists not the case where it doesn't.  At the
very least, test cases where there are *no* repeats are not representative
of cases people would be using pg_stat_statements with.

As for the measured slowdown with larger hashtable size (but still smaller
than the number of distinct queries in the test), my money is on the
larger table not fitting in L3 cache or something like that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

I wrote:
 If I understand this test scenario properly, there are no duplicate
 queries, so that each iteration creates a new hashtable entry (possibly
 evicting an old one).  And it's a single-threaded test, so that there
 can be no benefit from reduced locking.

After looking more closely, it's *not* single-threaded, but each pgbench
thread is running through the same sequence of 1 randomly generated
SQL statements.  So everything depends on how nearly those clients stay
in step.  I bet they'd stay pretty nearly in step, though --- any process
lagging behind would find all the hashtable entries already created, and
thus be able to catch up relative to the ones in the lead which are being
slowed by having to write out their query texts.  So it seems fairly
likely that this scenario is greatly stressing the case where multiple
processes redundantly create the same hashtable entry.  In any case,
while the same table entry does get touched once-per-client over a short
interval, there's no long-term reuse of table entries.  So I still say
this isn't at all representative of real-world use of pg_stat_statements.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-30 Thread Robert Haas

On Thu, Jan 30, 2014 at 12:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 If I understand this test scenario properly, there are no duplicate
 queries, so that each iteration creates a new hashtable entry (possibly
 evicting an old one).  And it's a single-threaded test, so that there
 can be no benefit from reduced locking.

 After looking more closely, it's *not* single-threaded, but each pgbench
 thread is running through the same sequence of 1 randomly generated
 SQL statements.  So everything depends on how nearly those clients stay
 in step.  I bet they'd stay pretty nearly in step, though --- any process
 lagging behind would find all the hashtable entries already created, and
 thus be able to catch up relative to the ones in the lead which are being
 slowed by having to write out their query texts.  So it seems fairly
 likely that this scenario is greatly stressing the case where multiple
 processes redundantly create the same hashtable entry.  In any case,
 while the same table entry does get touched once-per-client over a short
 interval, there's no long-term reuse of table entries.  So I still say
 this isn't at all representative of real-world use of pg_stat_statements.

One could test it with each pgbench thread starting at a random point
in the same sequence and wrapping at the end.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

Robert Haas robertmh...@gmail.com writes:
 One could test it with each pgbench thread starting at a random point
 in the same sequence and wrapping at the end.

Well, the real point is that 1 distinct statements all occurring with
exactly the same frequency isn't a realistic scenario: any hashtable size
less than 1 necessarily sucks, any size = 1 is perfect.

I'd think that what you want to test is a long-tailed frequency
distribution (probably a 1/N type of law) where a small number of
statements account for most of the hits and there are progressively fewer
uses of less common statements.  What would then be interesting is how
does the performance change as the hashtable size is varied to cover more
or less of that distribution.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

BTW ... it occurs to me to wonder if it'd be feasible to keep the
query-texts file mmap'd in each backend, thereby reducing the overhead
to write a new text to about the cost of a memcpy, and eliminating the
read cost in pg_stat_statements() altogether.  It's most likely not worth
the trouble; but if a more-realistic benchmark test shows that we actually
have a performance issue there, that might be a way out without giving up
the functional advantages of Peter's patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-30 Thread Peter Geoghegan

On Thu, Jan 30, 2014 at 9:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 BTW ... it occurs to me to wonder if it'd be feasible to keep the
 query-texts file mmap'd in each backend, thereby reducing the overhead
 to write a new text to about the cost of a memcpy, and eliminating the
 read cost in pg_stat_statements() altogether.  It's most likely not worth
 the trouble; but if a more-realistic benchmark test shows that we actually
 have a performance issue there, that might be a way out without giving up
 the functional advantages of Peter's patch.

There could be a worst case for that scheme too, plus we'd have to
figure out how to make in work with windows, which in the case of
mmap() is not a sunk cost AFAIK. I'm skeptical of the benefit of
pursuing that.

I'm totally unimpressed with the benchmark as things stand. It relies
on keeping 64 clients in perfect lockstep as each executes 10,000
queries that are each unique snowflakes.  Though even though they're
unique snowflakes, and even though there are 10,000 of them, everyone
executes the same one at exactly the same time relative to each other,
in exactly the same order as quickly as possible. Even still, the
headline reference score of -35% is completely misleading, because
it isn't comparing like with like in terms of has table size. This
benchmark incidentally recommends that we reduce the default hash
table size to improve performance when the hash table is under
pressure, which is ludicrous. It's completely backwards. You could
also use the benchmark to demonstrate that the overhead of calling
pg_stat_statements() is ridiculously high, since like creating a new
query text, that only requires a shared lock too.

This is an implausibly bad worst case for larger hash table sizes in
pg_stat_statements generally. 5,000 entries is enough for the large
majority of applications. But for those that hit that limit, in
practice they're still going to find the vast majority of queries
already in the table as they're executed. If they don't, they can
double or triple their max setting, because the shared memory
overhead is so low. No one has any additional overhead once their
query is in the hash table already. In reality, actual applications
could hardly be further from the perfectly uniform distribution of
distinct queries presented here.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement