Re: [HACKERS] PROPOSAL: tracking aggregated numbers from pg_stat_database

2013-04-16 Thread Greg Smith

On 4/13/13 12:44 PM, Tomas Vondra wrote:

I'm currently struggling with (quite uncommon) deployments where
databases are created/dropped regularly (not to mention tables and
indexes), and it's surprisingly difficult to process such stats to get
reasonable values.


Yes, it's a pain.  If you aggregate the table level data available now, 
you'll end up with some missing activity.  Work done between the last 
snapshot and when the drop happened is gone right now, whereas your 
aggregated stats view would preserve that activity.  The real fun is if 
the new table has the same name as the old one, which gives you all the 
negative value headaches a pg_stat_reset introduces too.


It's possible to make a case for why database level aggregate statistics 
are useful.  I don't know that yours is compelling enough to justify 
itself though, given that as you say this is an uncommon situation.  In 
something similar to your setup I've just accepted that I have to save 
the snapshots into a table, will occasionally lose some mid-snapshot 
data, and I use a combination of correction updates to that table and 
SQL window functions to provide a useful view.  It's a pain to do and I 
end up having to customize this approach for seemingly every install, 
but it can be made accurate enough for most uses.


The gigantic hole in this area I was most interested in for 9.4 
development was the lack of write statistics.  Not having 
pg_stat_database.blks_write, pg_statio_user_tables.heap_blks_write or 
pg_statio_user_indexes.idx_blks_write is a crippling operations/planning 
limitation of the database.  From that perspective, now isn't quite the 
right time to add more aggregation on top of that data, since the 
aggregation will make adding additional counters a bit more complicated. 
 It's not a big difference, but thinking in that direction doesn't help 
your suggestion.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PROPOSAL: tracking aggregated numbers from pg_stat_database

2013-04-14 Thread Robert Haas
On Sat, Apr 13, 2013 at 9:01 AM, Peter Eisentraut pete...@gmx.net wrote:
 On Sat, 2013-04-06 at 21:51 +0200, Tomas Vondra wrote:
 This more or less works in stable environments, but once you start
 dropping databases (think of hosting with shared DB server) it gets
 unusable because after DROP DATABASE the database suddenly disappears
 from the sum.

 Therefore I do propose tracking the aggregated stats, similar to the
 pg_stat_bgwriter view.

 It seems like this will open a can of worms.  Maybe someone wants
 aggregated stats for pg_stat_user_tables?  Or maybe instead of the sum,
 someone wants to track the average?  etc.  I don't think we should turn
 the statistics collector into a poor man's data warehouse or statistics
 engine.  Couldn't you transfer the data to some other system for
 long-term analysis?  Maybe you could even use event triggers to have
 DROP DATABASE do that automatically.

I think Tomas's point about objects getting dropped is a pretty fair
one, although I agree there are potential cans of worms here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PROPOSAL: tracking aggregated numbers from pg_stat_database

2013-04-13 Thread Peter Eisentraut
On Sat, 2013-04-06 at 21:51 +0200, Tomas Vondra wrote:
 This more or less works in stable environments, but once you start
 dropping databases (think of hosting with shared DB server) it gets
 unusable because after DROP DATABASE the database suddenly disappears
 from the sum.
 
 Therefore I do propose tracking the aggregated stats, similar to the
 pg_stat_bgwriter view. 

It seems like this will open a can of worms.  Maybe someone wants
aggregated stats for pg_stat_user_tables?  Or maybe instead of the sum,
someone wants to track the average?  etc.  I don't think we should turn
the statistics collector into a poor man's data warehouse or statistics
engine.  Couldn't you transfer the data to some other system for
long-term analysis?  Maybe you could even use event triggers to have
DROP DATABASE do that automatically.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PROPOSAL: tracking aggregated numbers from pg_stat_database

2013-04-13 Thread Tomas Vondra
On 13.4.2013 15:01, Peter Eisentraut wrote:
 On Sat, 2013-04-06 at 21:51 +0200, Tomas Vondra wrote:
 This more or less works in stable environments, but once you start
 dropping databases (think of hosting with shared DB server) it gets
 unusable because after DROP DATABASE the database suddenly disappears
 from the sum.

 Therefore I do propose tracking the aggregated stats, similar to the
 pg_stat_bgwriter view. 
 
 It seems like this will open a can of worms.  Maybe someone wants
 aggregated stats for pg_stat_user_tables?  Or maybe instead of the sum,
 someone wants to track the average?  etc.  I don't think we should turn

What I propose is a simple cumulative counter, just like the other
counters we do have right now. I don't think tracking an average (or any
other statistics) makes much sense here. And as the number of objects
changes over time (e.g. dbs may be created/dropped), I'm wondering what
would be the definition of average?

BTW I've proposed tracking aggregated table/index stats in my second
message in this thread.

 the statistics collector into a poor man's data warehouse or statistics
 engine.  Couldn't you transfer the data to some other system for

I certainly don't want to overcomplicate the stats system, and I don;t
think I'm turning it into a DWH or statistics engine. And even with
these aggregated counters, it still requires snapshotting and additional
processing. It's just a bunch of counters.

I'm currently struggling with (quite uncommon) deployments where
databases are created/dropped regularly (not to mention tables and
indexes), and it's surprisingly difficult to process such stats to get
reasonable values.

The point is this allows tracking data that are otherwise effectively
lost. With the current stats you have to discard intervals where
databases were dropped (because well, the data disappear so you don't
know what is the actual snapshot diff). Depending on the number of DB
drops and snapshot interval, this may very well be most of the time.

 long-term analysis?  Maybe you could even use event triggers to have
 DROP DATABASE do that automatically.

I don't think event triggers are a good solution, although I'm wondering
how that's supposed to work.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PROPOSAL: tracking aggregated numbers from pg_stat_database

2013-04-13 Thread Alvaro Herrera
Tomas Vondra wrote:
 On 13.4.2013 15:01, Peter Eisentraut wrote:

  long-term analysis?  Maybe you could even use event triggers to have
  DROP DATABASE do that automatically.
 
 I don't think event triggers are a good solution, although I'm wondering
 how that's supposed to work.

It doesn't, because event triggers do not support shared objects
(databases, tablespaces, roles).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PROPOSAL: tracking aggregated numbers from pg_stat_database

2013-04-07 Thread Tomas Vondra
On 6.4.2013 21:51, Tomas Vondra wrote:
 Hi,
 
 I'm regularly using pg_stat_database view to analyze various aspects of
 behavior of the cluster. The #1 issue I'm constantly running into is
 that to get cluster-level view (across all the databases), the table
 needs to be aggregated like this:
 
 SELECT
 SUM(blks_hit) blks_hit,
 SUM(blks_read) blks_read
 FROM pg_stat_database
 
 This more or less works in stable environments, but once you start
 dropping databases (think of hosting with shared DB server) it gets
 unusable because after DROP DATABASE the database suddenly disappears
 from the sum.
 
 Therefore I do propose tracking the aggregated stats, similar to the
 pg_stat_bgwriter view. This does not require new messages (thanks to
 reuse of the existing messages), and I expect the overhead to be
 negligible (a few bytes of storage, minimal CPU).
 
 I think it does not make sense to merge this into pg_stat_bgwriter,
 creating a new view (can't think of a good name though), seems like a
 much better choice to me.

Attached is a first version of the patch, just to illustrate the idea.
It creates a new system view pg_stat_agg_database with aggregated data.
There are no docs, no regression tests etc.

Now, I'm thinking if we should do something similar with database object
(table/index) stats, i.e. keep maintain aggregated data. This might seem
silly at first, but consider for example a partitioned table. It's
common to have a db-level metrics on idx/seq_scans, but when you drop a
partition (e.g. the oldest one), you may get into the same trouble as
with database stats (see my previous post).

So I think it would be handy to define table/index stats aggregated at
the db-level, i.e. there would be one row for each database. I don't
think it makes much sense to aggregate vacuum/analyze info (counts and
timestamps), which means 18 BIGINT counters from pg_stat[io]_tables, and
10 BIGINT counters from pg_stat[io]_indexes. That's 224B for each
database, which I believe is negligible.

OTOH it would be very handy to have this info aggretated per-schema and
per-tablespace, but I'd say to do that later in a separate patch.

Opinions?

Tomas
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 57adbf6..28f3ca2 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -636,6 +636,27 @@ CREATE VIEW pg_stat_database AS
 pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
 FROM pg_database D;
 
+CREATE VIEW pg_stat_agg_database AS
+SELECT
+pg_stat_get_all_numbackends() AS numbackends,
+pg_stat_get_all_xact_commit() AS xact_commit,
+pg_stat_get_all_xact_rollback() AS xact_rollback,
+pg_stat_get_all_blocks_fetched() -
+pg_stat_get_all_blocks_hit() AS blks_read,
+pg_stat_get_all_blocks_hit() AS blks_hit,
+pg_stat_get_all_tuples_returned() AS tup_returned,
+pg_stat_get_all_tuples_fetched() AS tup_fetched,
+pg_stat_get_all_tuples_inserted() AS tup_inserted,
+pg_stat_get_all_tuples_updated() AS tup_updated,
+pg_stat_get_all_tuples_deleted() AS tup_deleted,
+pg_stat_get_all_conflict_all() AS conflicts,
+pg_stat_get_all_temp_files() AS temp_files,
+pg_stat_get_all_temp_bytes() AS temp_bytes,
+pg_stat_get_all_deadlocks() AS deadlocks,
+pg_stat_get_all_blk_read_time() AS blk_read_time,
+pg_stat_get_all_blk_write_time() AS blk_write_time,
+pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
+
 CREATE VIEW pg_stat_database_conflicts AS
 SELECT
 D.oid AS datid,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 29d986a..121794d 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4478,6 +4478,12 @@ pgstat_recv_tabstat(PgStat_MsgTabstat *msg, int len)
 	int			i;
 	bool		found;
 
+	/* update aggregated data first */
+	globalStats.n_xact_commit += (PgStat_Counter) (msg-m_xact_commit);
+	globalStats.n_xact_rollback += (PgStat_Counter) (msg-m_xact_rollback);
+	globalStats.n_block_read_time += msg-m_block_read_time;
+	globalStats.n_block_write_time += msg-m_block_write_time;
+
 	dbentry = pgstat_get_db_entry(msg-m_databaseid, true);
 
 	/*
@@ -4561,6 +4567,15 @@ pgstat_recv_tabstat(PgStat_MsgTabstat *msg, int len)
 		dbentry-n_tuples_deleted += tabmsg-t_counts.t_tuples_deleted;
 		dbentry-n_blocks_fetched += tabmsg-t_counts.t_blocks_fetched;
 		dbentry-n_blocks_hit += tabmsg-t_counts.t_blocks_hit;
+
+		/* update the aggregated counters too */
+		globalStats.n_blocks_fetched += tabmsg-t_counts.t_blocks_fetched;
+		globalStats.n_blocks_hit += tabmsg-t_counts.t_blocks_hit;
+		globalStats.n_tuples_returned += tabmsg-t_counts.t_tuples_returned;
+		globalStats.n_tuples_fetched += tabmsg-t_counts.t_tuples_fetched;
+		

[HACKERS] PROPOSAL: tracking aggregated numbers from pg_stat_database

2013-04-06 Thread Tomas Vondra
Hi,

I'm regularly using pg_stat_database view to analyze various aspects of
behavior of the cluster. The #1 issue I'm constantly running into is
that to get cluster-level view (across all the databases), the table
needs to be aggregated like this:

SELECT
SUM(blks_hit) blks_hit,
SUM(blks_read) blks_read
FROM pg_stat_database

This more or less works in stable environments, but once you start
dropping databases (think of hosting with shared DB server) it gets
unusable because after DROP DATABASE the database suddenly disappears
from the sum.

Therefore I do propose tracking the aggregated stats, similar to the
pg_stat_bgwriter view. This does not require new messages (thanks to
reuse of the existing messages), and I expect the overhead to be
negligible (a few bytes of storage, minimal CPU).

I think it does not make sense to merge this into pg_stat_bgwriter,
creating a new view (can't think of a good name though), seems like a
much better choice to me.

And now a bit more detailed explanation of the issues ...

Analysis is usually based on comparing two snapshots (say a few minutes
apart), and this makes is rather much more difficult because the dropped
databases suddenly disappear from the second snapshot.

Say for example there are two databases, A and B, with stats snapshotted
at T1 and T2. The database B is dropped sometimes between the snapshots.

So the snaphots look like this:

 time | db | blks_read | bkls_hit
---
   T1 |  A |   100 |   50
   T1 |  B |   100 |   50
   T2 |  A |   150 |   75

Now, the aggregated data look like this:

 time | blks_read | bkls_hit
--
   T1 |   200 |  100
   T2 |   150 |   75

So the difference (T2-T1) is

 blks_read | bkls_hit
--
   -50 |  -25

Yes, negative values do not make much sense. It's very difficult to
detect such behavior and account for that.

It might be possible to solve (some of) the issues with elaborate
snapshotting system, but it's awkward / difficult to use. Adding a new
system view works much nicer.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers