Re: [DOCS] row-level stats and last analyze time

2007-04-26 Thread Neil Conway
On Tue, 2007-04-24 at 17:38 -0400, Neil Conway wrote:
> which included other modifications to reduce the pgstat I/O volume in
> 8.1. I don't think this particular change was wise

I looked into this a bit further:

(1) I believe the reasoning for Tom's earlier change was not to reduce
the I/O between the backend and the pgstat process: it was to keep the
in-memory stats hash tables small, and to reduce the amount of data that
needs to be written to disk. When the only stats messages we get for a
table are VACUUM or ANALYZE messages, we discard the message in the
pgstat daemon.

(2) If stats_row_level is false, there won't be a stats hash entry for
any tables, so we can skip sending the VACUUM or ANALYZE message in the
first place, by the same logic. (This is more debatable if the user just
disabled stats_row_level for the current session, although since only a
super-user can do that, perhaps that's OK.)

(3) I don't like the fact that the current coding is so willing to throw
away VACUUM and ANALYZE pgstat messages. I think it is quite plausible
that the DBA might be interested in the last-VACUUM and last-ANALYZE
information for a table which hasn't had live operations applied to it
recently. The rest of the pgstat code has a similarly disappointing
willingness to silently discard messages it doesn't think are worth
keeping (e.g. pgstat_recv_autovac() is ignored for databases with no
other activity, and pgstat_count_xact_commit/rollback() is a no-op
unless *either* row-level or block-level stats are enabled.)

If we're so concerned about saving space in the stats hash tables for
tables that don't see non-VACUUM / non-ANALYZE activity, why not arrange
to record the timestamps for database-wide VACUUMs and ANALYZEs
separately from table-local VACUUMs and ANALYZEs? That is, a table's
last_vacuum time could effectively be the max of the last database-wide
vacuum time and the last VACUUM on that particular table. (Recording the
time of the last database-wide VACUUM might be worth doing anyway, e.g.
for avoiding wraparound failure).

Comments?

-Neil



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[DOCS] Doc tweak: remove outdated discussion of stats params

2007-04-26 Thread Neil Conway
Attached is a patch that removes a now-false statement from the
documentation: stats_row_level is enabled by default in HEAD, so the
stats traffic in the default configuration is not necessarily small. I
just removed the whole paragraph; anyone want to argue we should
rephrase the paragraph instead?

-Neil

Index: doc/src/sgml/monitoring.sgml
===
RCS file: /home/neilc/postgres/cvs_root/pgsql/doc/src/sgml/monitoring.sgml,v
retrieving revision 1.49
diff -c -p -r1.49 monitoring.sgml
*** doc/src/sgml/monitoring.sgml	26 Apr 2007 22:10:54 -	1.49
--- doc/src/sgml/monitoring.sgml	26 Apr 2007 22:14:58 -
*** postgres: user SET.)

- 
-
- 
-  Since the parameters stats_block_level, and
-  stats_row_level default to false,
-  very few statistics are collected in the default
-  configuration. Enabling either of these configuration
-  variables will significantly increase the amount of useful data
-  produced by the statistics facilities, at the expense of
-  additional run-time overhead.
- 
-
- 
   
  
   
--- 175,180 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [DOCS] row-level stats and last analyze time

2007-04-26 Thread Alvaro Herrera
Neil Conway wrote:

> (3) I don't like the fact that the current coding is so willing to throw
> away VACUUM and ANALYZE pgstat messages. I think it is quite plausible
> that the DBA might be interested in the last-VACUUM and last-ANALYZE
> information for a table which hasn't had live operations applied to it
> recently. The rest of the pgstat code has a similarly disappointing
> willingness to silently discard messages it doesn't think are worth
> keeping (e.g. pgstat_recv_autovac() is ignored for databases with no
> other activity, and pgstat_count_xact_commit/rollback() is a no-op
> unless *either* row-level or block-level stats are enabled.)

One thing to keep in mind is that autovac drives some decision from
whether the database has a pgstat entry or not.  In particular it means
it doesn't bother processing non-connectable databases, unless they are
close to Xid wraparound.

I think this behavior is a useful one, since usually vacuuming those
databases is a waste of time anyway.  Whether to drive it from pgstat or
from somewhere else is another matter, but if you want to drive it from
another mechanism, keep in mind that the autovacuum launcher (which is
the process that makes this decision) is not connected to any database
so it cannot examine any catalog's content.  There are of course ways
around that: for example you could put the information in the
pg_database flatfile.  But it's something to keep in mind if you want to
change it.

> If we're so concerned about saving space in the stats hash tables for
> tables that don't see non-VACUUM / non-ANALYZE activity, why not arrange
> to record the timestamps for database-wide VACUUMs and ANALYZEs
> separately from table-local VACUUMs and ANALYZEs? That is, a table's
> last_vacuum time could effectively be the max of the last database-wide
> vacuum time and the last VACUUM on that particular table. (Recording the
> time of the last database-wide VACUUM might be worth doing anyway, e.g.
> for avoiding wraparound failure).

Another thing to keep in mind is that autovacuum does not do
database-wide vacuums anymore -- they are not needed.  Xid wraparound
decisions are handled on a table-by-table basis, so information about
when the last database-wide vacuum was is not needed.

Note that Xid wraparound decisions are driven by information in
pg_class.  So it's not a problem that pgstat may lose the info from this
POV.

The bottom line is that the current pgstat behavior and autovacuum are
closely related.  So if you want to change pgstats you should also keep
an eye on how it's going to affect autovac.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match