[DOCS] pg_stat_statements max size clarification

2017-07-15 Thread baron
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/9.6/static/pgstatstatements.html
Description:

From https://www.postgresql.org/docs/current/static/pgstatstatements.html

pg_stat_statements.max (integer)
pg_stat_statements.max is the maximum number of statements tracked by the
module (i.e., the maximum number of rows in the pg_stat_statements view). If
more distinct statements than that are observed, information about the
least-executed statements is discarded. The default value is 5000. This
parameter can only be set at server start.

I wonder if "least-executed" is correct. I'm not an expert and 
haven't
convinced myself of this by examining the code, but I think after N distinct
queryid's have been seen, then any additional ones are ignored. But that may
not be "least-executed" at all. It's "most-recent" 
instead. I think we need
a new phrase here.

-- 
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs


Re: [DOCS] pg_stat_statements max size clarification

2017-07-16 Thread Baron Schwartz
I will mull over this and see if I can suggest a short phrase that points
to the nuances without explaining them. I think the current phrasing draws
attention to itself by seeming dubious, which creates a kind of cognitive
dissonance for the reader. This report isn't mean to be a bug report on the
functionality.

On Sat, Jul 15, 2017 at 10:17 PM Tom Lane  wrote:

> Peter Geoghegan  writes:
> > On Sat, Jul 15, 2017 at 5:31 PM,   wrote:
> >> I wonder if "least-executed" is correct. I'm not an
> expert and haven't
> >> convinced myself of this by examining the code, but I think after N
> distinct
> >> queryid's have been seen, then any additional ones are ignored. But
> that may
> >> not be "least-executed" at all. It's
> "most-recent" instead. I think we need
> >> a new phrase here.
>
> > It's most executed since tracking for the entry began, with a special
> > heuristic for queries that take a long time to execute, and might
> > therefore consistently be evicted before execution finishes and costs
> > are tallied (see "sticky entries" stuff for full details). Most
> > executed means the total number of calls, which may not be the best
> > thing to evict on the basis of, but certainly isn't too bad.
>
> > The way it actually works is that either 5% of all entries or 10
> > entries are evicted (whichever amount is greatest) once
> > pg_stat_statements.max entries are reached. You're right that this
> > means that the most marginal of entries cannot be usefully tracked,
> > but I doubt that that's much of a problem in practice. It's the usual
> > "recency versus frequency" cache eviction problem, but for query cost
> > tracking purposes if 5,000 entries or 10,000 entries is truly
> > insufficient, then pg_stat_statements probably isn't the right tool.
>
> The short answer, really, is that the algorithm is too complicated to be
> worth explaining in the documentation --- and it's subject to change,
> anyway.  But "least-executed" is a reasonable short description, since
> frequency of use is a major factor in the decisions.  Certainly
> "most-recent" is flat out wrong.
>
> I am not sure whether this complaint is actually meant as a bug report
> that the algorithm didn't seem to work well on the OP's use case.  If so,
> we'd need a lot more details to have any hope of improving it (and the
> documentation comments aren't the right submission forum, either).
>
> regards, tom lane
>