Re: [PROPOSAL] timestamp informations to pg_stat_statements

2018-04-04 Thread legrand legrand
> No, the entries are not removed randomly. We track "usage" for each
> entry (essentially +1 for each time the query got executed, with a decay
> factor applied on each eviction (and we evict 5% at a time).

OK I apologize, I hadn't looked in source code in details, and effectively
the "Usage" algorithm based on execution_time and calls will keep the 
longest queries (and that's a good thing). In some cases (after an 
application release or a postgres patch for exemple) we may be interested 
in leastiests ones.

I'm still convinced that adding those kinds of informations 
(with a planid as done in pg_stat_plans) would be an improvement.

Maybe because having used v$sql, first_load_time, last_active_time 
and plan_hash_value for (too) many years).

It's always important to know when a new plan (good or bad) occurs, and pgss
could
give this kind of informations even without aggregation collector.

Last point (for developer), after testing this patch, in most cases when
calls=1: created and 
last_updated values are identical, they don't even differ to reflect 
execution (nor planing) duration. Is that a precision or coding problem ?

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: [PROPOSAL] timestamp informations to pg_stat_statements

2018-03-10 Thread Tomas Vondra


On 03/10/2018 04:43 PM, legrand legrand wrote:
> +1
> Having the time of first occurence of a statement is very usefull
> for trouble shouting, it permits for exemple to retrieve the order of
> operations in some complex cases (and thoses informations aren't
> taken by any third party collecting tool, that will only be able to
> provide a time range of occurence).
> 

I really don't see how this would be useful in reconstructing order of
operations, particularly in complex cases where I'd expect the queries
to be executed repeatedly / interleaved in different ways. Furthermore,
it would only work for the very first execution of all statements, so
you would probably have to reset the stats over and over - which seems
to directly contradict the purpose of pg_stat_statements (aggregation of
data over longer periods of time).

So unfortunately this seems rather useless, and log_statements=all seems
like a much better / reliable approach to achieve that.

I also doubt it really allows computation of averages, e.g. queries per
second, because practical workloads usually have daily/weekly patterns
(and different queries may follow different patterns).

So I agree with Peter Geoghegan that tools regularly snapshotting
pg_stat_statements and processing that are a better approach. I've seen
a bunch of simple scripts doing just that, actually.

> I thougth that pgss rows where removed randomly when max rows was
> reached, wouldn't having last_executed information permit a kind of
> LRU removal ?
> 

No, the entries are not removed randomly. We track "usage" for each
entry (essentially +1 for each time the query got executed, with a decay
factor applied on each eviction (and we evict 5% at a time).

It's not immediately obvious why something based on time of the
first/last execution would be better than the current algorithm.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PROPOSAL] timestamp informations to pg_stat_statements

2018-03-10 Thread legrand legrand
+1
Having the time of first occurence of a statement is very usefull for
trouble shouting,
it permits for exemple to retrieve the order of operations in some complex
cases (and thoses informations aren't taken by any third party collecting
tool, that will only be able to provide a time range of occurence).

I thougth that pgss rows where removed randomly when max rows was reached,
wouldn't having last_executed information permit a kind of LRU removal ?

Regards
PAscal





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html