Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-20 Thread Kevin Grittner
Jeff Janes jeff.ja...@gmail.com wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 So here's v11.  I intend to commit this shortly.  (I wanted to get it
 out before lunch, but I introduced a silly bug that took me a bit to
 fix.)

 On Windows with Mingw I get this:

 pgstat.c:4389:8: warning: variable 'found' set but not used
 [-Wunused-but-set-variable]

 I don't get that on Linux, but I bet that is just the gcc version
 (4.6.2 vs 4.4.6) rather than the OS.

I get it on Linux with gcc version 4.7.2.

 It looks like it is just a useless variable

Agreed.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-20 Thread Alvaro Herrera
Jeff Janes escribió:
 On Mon, Feb 18, 2013 at 7:50 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
 
  So here's v11.  I intend to commit this shortly.  (I wanted to get it
  out before lunch, but I introduced a silly bug that took me a bit to
  fix.)
 
 On Windows with Mingw I get this:
 
 pgstat.c:4389:8: warning: variable 'found' set but not used
 [-Wunused-but-set-variable]
 
 I don't get that on Linux, but I bet that is just the gcc version
 (4.6.2 vs 4.4.6) rather than the OS.  It looks like it is just a
 useless variable, rather than any possible cause of the Windows make
 check failure (which I can't reproduce).

Hm, I remember looking at that code and thinking that the return there
might not be the best idea because it'd miss running the code that
checks for clock skew; and so the found was necessary because the
return was to be taken out.  But on second thought, a database for which the
loop terminates early has already run the clock-skew detection code
recently, so that's probably not worth worrying about.

IOW I will just remove that variable.  Thanks for the notice.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-19 Thread Tomas Vondra

Dne 19.02.2013 05:46, Alvaro Herrera napsal:

Alvaro Herrera wrote:

I have pushed it now.  Further testing, of course, is always 
welcome.


Mastodon failed:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodondt=2013-02-19%2000%3A00%3A01

probably worth investigating a bit; we might have broken something.


Hmmm, interesting. A single Windows machine, while the other Windows 
machines seem to work fine (although some of them were not built for a 
few weeks).


I'll look into that, but I have no clue why this might happen. Except 
maybe for some unexpected timing issue or something ...


Tomas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-19 Thread Tom Lane
Tomas Vondra t...@fuzzy.cz writes:
 Dne 19.02.2013 05:46, Alvaro Herrera napsal:
 Mastodon failed:
 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodondt=2013-02-19%2000%3A00%3A01
 
 probably worth investigating a bit; we might have broken something.

 Hmmm, interesting. A single Windows machine, while the other Windows 
 machines seem to work fine (although some of them were not built for a 
 few weeks).

Could be random chance --- we've seen the same failure before, eg

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodondt=2012-11-25%2006%3A00%3A00

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-19 Thread Tomas Vondra

Dne 19.02.2013 11:27, Tom Lane napsal:

Tomas Vondra t...@fuzzy.cz writes:

Dne 19.02.2013 05:46, Alvaro Herrera napsal:

Mastodon failed:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodondt=2013-02-19%2000%3A00%3A01

probably worth investigating a bit; we might have broken something.



Hmmm, interesting. A single Windows machine, while the other Windows
machines seem to work fine (although some of them were not built for 
a

few weeks).


Could be random chance --- we've seen the same failure before, eg


http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodondt=2012-11-25%2006%3A00%3A00


Maybe. But why does random chance happens to me only with regression 
tests and not lottery, like to normal people?


Tomas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-19 Thread Tomas Vondra
On 19.2.2013 11:27, Tom Lane wrote:
 Tomas Vondra t...@fuzzy.cz writes:
 Dne 19.02.2013 05:46, Alvaro Herrera napsal:
 Mastodon failed:
 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodondt=2013-02-19%2000%3A00%3A01

 probably worth investigating a bit; we might have broken something.
 
 Hmmm, interesting. A single Windows machine, while the other Windows 
 machines seem to work fine (although some of them were not built for a 
 few weeks).
 
 Could be random chance --- we've seen the same failure before, eg
 
 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodondt=2012-11-25%2006%3A00%3A00
 
   regards, tom lane

I'm looking at that test, and I'm not really sure about a few details.

First, this function seems pretty useless to me:

===
create function wait_for_stats() returns void as $$
  declare
start_time timestamptz := clock_timestamp();
updated bool;
  begin
-- we don't want to wait forever; loop will exit after 30 seconds
for i in 1 .. 300 loop

  -- check to see if indexscan has been sensed
  SELECT (st.idx_scan = pr.idx_scan + 1) INTO updated
FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr
   WHERE st.relname='tenk2' AND cl.relname='tenk2';

  exit when updated;

  -- wait a little
  perform pg_sleep(0.1);

  -- reset stats snapshot so we can test again
  perform pg_stat_clear_snapshot();

end loop;

-- report time waited in postmaster log (where it won't change test
output)
raise log 'wait_for_stats delayed % seconds',
  extract(epoch from clock_timestamp() - start_time);
  end
  $$ language plpgsql;
===

AFAIK the stats remain the same within a transaction, and as a function
runs within a transaction, it will either get new data on the first
iteration, or it will run all 300 of them. I've checked several
buildfarm members and I'm yet to see a single duration between 12ms and
30sec.

So IMHO we can replace the function call with pg_sleep(30) and we'll get
about the same effect.

But this obviously does not answer the question why it failed, although
on both occasions there's this log message:

[50b1b7fa.0568:14] LOG:  wait_for_stats delayed 34.75 seconds

which essentialy means the stats were not updated before the call to
wait_for_stats().

Anyway, there are these two failing queries:

===
-- check effects
SELECT st.seq_scan = pr.seq_scan + 1,
   st.seq_tup_read = pr.seq_tup_read + cl.reltuples,
   st.idx_scan = pr.idx_scan + 1,
   st.idx_tup_fetch = pr.idx_tup_fetch + 1
  FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr
 WHERE st.relname='tenk2' AND cl.relname='tenk2';
 ?column? | ?column? | ?column? | ?column?
--+--+--+--
 t| t| t| t
(1 row)

SELECT st.heap_blks_read + st.heap_blks_hit = pr.heap_blks + cl.relpages,
   st.idx_blks_read + st.idx_blks_hit = pr.idx_blks + 1
  FROM pg_statio_user_tables AS st, pg_class AS cl, prevstats AS pr
 WHERE st.relname='tenk2' AND cl.relname='tenk2';
 ?column? | ?column?
--+--
 t| t
(1 row)
===

The first one returns just falses, the second one retuns either (t,f) or
(f,f) - for the two failures posted by Alvaro and TL earlier today.

I'm really wondering how that could happen. The only thing that I can
think of is some strange timing issue, causing lost requests to write
the stats or maybe some of the stats updates. Hmmm, IIRC the stats are
sent over UDP - couldn't that be related?

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-19 Thread Alvaro Herrera
Tomas Vondra wrote:

 AFAIK the stats remain the same within a transaction, and as a function
 runs within a transaction, it will either get new data on the first
 iteration, or it will run all 300 of them. I've checked several
 buildfarm members and I'm yet to see a single duration between 12ms and
 30sec.

No, there's a call to pg_stat_clear_snapshot() that takes care of that.

 I'm really wondering how that could happen. The only thing that I can
 think of is some strange timing issue, causing lost requests to write
 the stats or maybe some of the stats updates. Hmmm, IIRC the stats are
 sent over UDP - couldn't that be related?

yes, UDP packet drops can certainly happen.  This is considered a
feature (do not cause backends to block when the network socket to stat
collector is swamped; it's better to lose some stat messages instead).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-19 Thread Tomas Vondra
On 19.2.2013 23:31, Alvaro Herrera wrote: Tomas Vondra wrote:
 
 AFAIK the stats remain the same within a transaction, and as a
 function runs within a transaction, it will either get new data on
 the first iteration, or it will run all 300 of them. I've checked
 several buildfarm members and I'm yet to see a single duration
 between 12ms and 30sec.
 
 No, there's a call to pg_stat_clear_snapshot() that takes care of
 that.

Aha! Missed that for some reason. Thanks.

 
 I'm really wondering how that could happen. The only thing that I
 can think of is some strange timing issue, causing lost requests to
 write the stats or maybe some of the stats updates. Hmmm, IIRC the
 stats are sent over UDP - couldn't that be related?
 
 yes, UDP packet drops can certainly happen.  This is considered a 
 feature (do not cause backends to block when the network socket to
 stat collector is swamped; it's better to lose some stat messages
 instead).

Is there anything we could add to the test to identify this? Something
that either shows stats were sent and stats arrived (maybe in the
log only), or that some UPD packets were dropped?

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-19 Thread Jeff Janes
On Mon, Feb 18, 2013 at 7:50 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:

 So here's v11.  I intend to commit this shortly.  (I wanted to get it
 out before lunch, but I introduced a silly bug that took me a bit to
 fix.)

On Windows with Mingw I get this:

pgstat.c:4389:8: warning: variable 'found' set but not used
[-Wunused-but-set-variable]

I don't get that on Linux, but I bet that is just the gcc version
(4.6.2 vs 4.4.6) rather than the OS.  It looks like it is just a
useless variable, rather than any possible cause of the Windows make
check failure (which I can't reproduce).

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-18 Thread Alvaro Herrera
Tomas Vondra wrote:

 So, here's v10 of the patch (based on the v9+v9a), that implements the
 approach described above.
 
 It turned out to be much easier than I expected (basically just a
 rewrite of the pgstat_read_db_statsfile_timestamp() function.

Thanks.  I'm giving this another look now.  I think the new code means
we no longer need the first_write logic; just let the collector idle
until we get the first request.  (If for some reason we considered that
we should really be publishing initial stats as early as possible, we
could just do a write_statsfiles(allDbs) call before entering the main
loop.  But I don't see any reason to do this.  If you do, please speak
up.)

Also, it seems to me that the new pgstat_db_requested() logic is
slightly bogus (in the inefficient sense, not the incorrect sense):
we should be comparing the timestamp of the request vs.  what's already
on disk instead of blindly returning true if the list is nonempty.  If
the request is older than the file, we don't need to write anything and
can discard the request.  For example, suppose that backend A sends a
request for a DB; we write the file.  If then quickly backend B also
requests stats for the same DB, with the current logic we'd go write the
file, but perhaps backend B would be fine with the file we already
wrote.

Another point is that I think there's a better way to handle nonexistant
files, instead of having to read the global file and all the DB records
to find the one we want.  Just try to read the database file, and only
if that fails try to read the global file and compare the timestamp (so
there might be two timestamps for each DB, one in the global file and
one in the DB-specific file.  I don't think this is a problem).  The
point is avoid having to read the global file if possible.

So here's v11.  I intend to commit this shortly.  (I wanted to get it
out before lunch, but I introduced a silly bug that took me a bit to
fix.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
***
*** 38,43 
--- 38,44 
  #include access/xact.h
  #include catalog/pg_database.h
  #include catalog/pg_proc.h
+ #include lib/ilist.h
  #include libpq/ip.h
  #include libpq/libpq.h
  #include libpq/pqsignal.h
***
*** 66,73 
   * Paths for the statistics files (relative to installation's $PGDATA).
   * --
   */
! #define PGSTAT_STAT_PERMANENT_FILENAME		global/pgstat.stat
! #define PGSTAT_STAT_PERMANENT_TMPFILE		global/pgstat.tmp
  
  /* --
   * Timer definitions.
--- 67,75 
   * Paths for the statistics files (relative to installation's $PGDATA).
   * --
   */
! #define PGSTAT_STAT_PERMANENT_DIRECTORY		pg_stat
! #define PGSTAT_STAT_PERMANENT_FILENAME		pg_stat/global.stat
! #define PGSTAT_STAT_PERMANENT_TMPFILE		pg_stat/global.tmp
  
  /* --
   * Timer definitions.
***
*** 115,120  int			pgstat_track_activity_query_size = 1024;
--- 117,123 
   * Built from GUC parameter
   * --
   */
+ char	   *pgstat_stat_directory = NULL;
  char	   *pgstat_stat_filename = NULL;
  char	   *pgstat_stat_tmpname = NULL;
  
***
*** 219,229  static int	localNumBackends = 0;
   */
  static PgStat_GlobalStats globalStats;
  
! /* Last time the collector successfully wrote the stats file */
! static TimestampTz last_statwrite;
  
! /* Latest statistics request time from backends */
! static TimestampTz last_statrequest;
  
  static volatile bool need_exit = false;
  static volatile bool got_SIGHUP = false;
--- 222,237 
   */
  static PgStat_GlobalStats globalStats;
  
! /* Write request info for each database */
! typedef struct DBWriteRequest
! {
! 	Oid			databaseid;		/* OID of the database to write */
! 	TimestampTz request_time;	/* timestamp of the last write request */
! 	slist_node	next;
! } DBWriteRequest;
  
! /* Latest statistics request times from backends */
! static slist_head	last_statrequests = SLIST_STATIC_INIT(last_statrequests);
  
  static volatile bool need_exit = false;
  static volatile bool got_SIGHUP = false;
***
*** 252,262  static void pgstat_sighup_handler(SIGNAL_ARGS);
  static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
  static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
  	 Oid tableoid, bool create);
! static void pgstat_write_statsfile(bool permanent);
! static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent);
  static void backend_read_statsfile(void);
  static void pgstat_read_current_status(void);
  
  static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
  static void pgstat_send_funcstats(void);
  static HTAB *pgstat_collect_oids(Oid catalogid);
--- 260,275 
  static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
  static PgStat_StatTabEntry 

Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-18 Thread Tomas Vondra
On 18.2.2013 16:50, Alvaro Herrera wrote:
 Tomas Vondra wrote:
 
 So, here's v10 of the patch (based on the v9+v9a), that implements the
 approach described above.

 It turned out to be much easier than I expected (basically just a
 rewrite of the pgstat_read_db_statsfile_timestamp() function.
 
 Thanks.  I'm giving this another look now.  I think the new code means
 we no longer need the first_write logic; just let the collector idle
 until we get the first request.  (If for some reason we considered that
 we should really be publishing initial stats as early as possible, we
 could just do a write_statsfiles(allDbs) call before entering the main
 loop.  But I don't see any reason to do this.  If you do, please speak
 up.)
 
 Also, it seems to me that the new pgstat_db_requested() logic is
 slightly bogus (in the inefficient sense, not the incorrect sense):
 we should be comparing the timestamp of the request vs.  what's already
 on disk instead of blindly returning true if the list is nonempty.  If
 the request is older than the file, we don't need to write anything and
 can discard the request.  For example, suppose that backend A sends a
 request for a DB; we write the file.  If then quickly backend B also
 requests stats for the same DB, with the current logic we'd go write the
 file, but perhaps backend B would be fine with the file we already
 wrote.

Hmmm, you're probably right.

 Another point is that I think there's a better way to handle nonexistant
 files, instead of having to read the global file and all the DB records
 to find the one we want.  Just try to read the database file, and only
 if that fails try to read the global file and compare the timestamp (so
 there might be two timestamps for each DB, one in the global file and
 one in the DB-specific file.  I don't think this is a problem).  The
 point is avoid having to read the global file if possible.

I don't think that's a good idea. Keeping the timestamps at one place is
a significant simplification, and I don't think it's worth the
additional complexity. And the overhead is minimal.

So my vote on this change is -1.

 
 So here's v11.  I intend to commit this shortly.  (I wanted to get it
 out before lunch, but I introduced a silly bug that took me a bit to
 fix.)

;-)

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-18 Thread Alvaro Herrera
Tomas Vondra wrote:
 On 18.2.2013 16:50, Alvaro Herrera wrote:

  Also, it seems to me that the new pgstat_db_requested() logic is
  slightly bogus (in the inefficient sense, not the incorrect sense):
  we should be comparing the timestamp of the request vs.  what's already
  on disk instead of blindly returning true if the list is nonempty.  If
  the request is older than the file, we don't need to write anything and
  can discard the request.  For example, suppose that backend A sends a
  request for a DB; we write the file.  If then quickly backend B also
  requests stats for the same DB, with the current logic we'd go write the
  file, but perhaps backend B would be fine with the file we already
  wrote.
 
 Hmmm, you're probably right.

I left it as is for now; I think it warrants revisiting.

  Another point is that I think there's a better way to handle nonexistant
  files, instead of having to read the global file and all the DB records
  to find the one we want.  Just try to read the database file, and only
  if that fails try to read the global file and compare the timestamp (so
  there might be two timestamps for each DB, one in the global file and
  one in the DB-specific file.  I don't think this is a problem).  The
  point is avoid having to read the global file if possible.
 
 I don't think that's a good idea. Keeping the timestamps at one place is
 a significant simplification, and I don't think it's worth the
 additional complexity. And the overhead is minimal.
 
 So my vote on this change is -1.

Fair enough.

I have pushed it now.  Further testing, of course, is always welcome.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-18 Thread Alvaro Herrera
Alvaro Herrera wrote:

 I have pushed it now.  Further testing, of course, is always welcome.

Mastodon failed:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodondt=2013-02-19%2000%3A00%3A01

probably worth investigating a bit; we might have broken something.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-17 Thread Tomas Vondra
On 17.2.2013 06:46, Alvaro Herrera wrote:
 Tomas Vondra wrote:
 
 I've been thinking about this (actually I had a really weird dream about
 it this night) and I think it might work like this:

 (1) check the timestamp of the global file - if it's too old, we need
 to send an inquiry or wait a bit longer

 (2) if it's new enough, we need to read it a look for that particular
 database - if it's not found, we have no info about it yet (this is
 the case handled by the dummy files)

 (3) if there's a database stat entry, we need to check the timestamp
 when it was written for the last time - if it's too old, send an
 inquiry and wait a bit longer

 (4) well, we have a recent global file, it contains the database stat
 entry and it's fresh enough - tadaa, we're done
 
 Hmm, yes, I think this is what I was imagining.  I had even considered
 that the timestamp would be removed from the per-db file as you suggest
 here.

So, here's v10 of the patch (based on the v9+v9a), that implements the
approach described above.

It turned out to be much easier than I expected (basically just a
rewrite of the pgstat_read_db_statsfile_timestamp() function.

I've done a fair amount of testing (and will do some more next week) but
it seems to work just fine - no errors, no measurable decrease of
performance etc.

regards
Tomas Vondra
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 9b92ebb..36c0d8b 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -38,6 +38,7 @@
 #include access/xact.h
 #include catalog/pg_database.h
 #include catalog/pg_proc.h
+#include lib/ilist.h
 #include libpq/ip.h
 #include libpq/libpq.h
 #include libpq/pqsignal.h
@@ -66,8 +67,9 @@
  * Paths for the statistics files (relative to installation's $PGDATA).
  * --
  */
-#define PGSTAT_STAT_PERMANENT_FILENAME		global/pgstat.stat
-#define PGSTAT_STAT_PERMANENT_TMPFILE		global/pgstat.tmp
+#define PGSTAT_STAT_PERMANENT_DIRECTORY		pg_stat
+#define PGSTAT_STAT_PERMANENT_FILENAME		pg_stat/global.stat
+#define PGSTAT_STAT_PERMANENT_TMPFILE		pg_stat/global.tmp
 
 /* --
  * Timer definitions.
@@ -115,6 +117,8 @@ int			pgstat_track_activity_query_size = 1024;
  * Built from GUC parameter
  * --
  */
+char	   *pgstat_stat_directory = NULL;
+int			pgstat_stat_dbfile_maxlen = 0;
 char	   *pgstat_stat_filename = NULL;
 char	   *pgstat_stat_tmpname = NULL;
 
@@ -219,11 +223,16 @@ static int	localNumBackends = 0;
  */
 static PgStat_GlobalStats globalStats;
 
-/* Last time the collector successfully wrote the stats file */
-static TimestampTz last_statwrite;
+/* Write request info for each database */
+typedef struct DBWriteRequest
+{
+	Oid			databaseid;		/* OID of the database to write */
+	TimestampTz request_time;	/* timestamp of the last write request */
+	slist_node	next;
+} DBWriteRequest;
 
-/* Latest statistics request time from backends */
-static TimestampTz last_statrequest;
+/* Latest statistics request times from backends */
+static slist_head	last_statrequests = SLIST_STATIC_INIT(last_statrequests);
 
 static volatile bool need_exit = false;
 static volatile bool got_SIGHUP = false;
@@ -252,11 +261,16 @@ static void pgstat_sighup_handler(SIGNAL_ARGS);
 static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
 static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
 	 Oid tableoid, bool create);
-static void pgstat_write_statsfile(bool permanent);
-static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent);
+static void pgstat_write_statsfiles(bool permanent, bool allDbs);
+static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent);
+static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent, bool deep);
+static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
 static void backend_read_statsfile(void);
 static void pgstat_read_current_status(void);
 
+static bool pgstat_write_statsfile_needed(void);
+static bool pgstat_db_requested(Oid databaseid);
+
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
 static HTAB *pgstat_collect_oids(Oid catalogid);
@@ -285,7 +299,6 @@ static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le
 static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
 static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
 
-
 /* 
  * Public functions called from postmaster follow
  * 
@@ -541,16 +554,40 @@ startup_failed:
 }
 
 /*
+ * subroutine for pgstat_reset_all
+ */
+static void
+pgstat_reset_remove_files(const char *directory)
+{
+	DIR * dir;
+	struct dirent * entry;
+	char	fname[MAXPGPATH];
+
+	dir = AllocateDir(pgstat_stat_directory);
+	while ((entry = ReadDir(dir, 

Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-16 Thread Tomas Vondra
On 15.2.2013 01:02, Tomas Vondra wrote:
 On 14.2.2013 22:24, Alvaro Herrera wrote:
 Alvaro Herrera escribió:
 Here's a ninth version of this patch.  (version 8 went unpublished).  I
 have simplified a lot of things and improved some comments; I think I
 understand much of it now.  I think this patch is fairly close to
 committable, but one issue remains, which is this bit in
 pgstat_write_statsfiles():

 I've marked this as Waiting on author for the time being.  I'm going to
 review/work on other patches now, hoping that Tomas will post an updated
 version in time for it to be considered for 9.3.
 
 Sadly I have no idea how to fix that, and I think the solution you
 suggested in the previous messages does not actually do the trick :-(


I've been thinking about this (actually I had a really weird dream about
it this night) and I think it might work like this:

(1) check the timestamp of the global file - if it's too old, we need
to send an inquiry or wait a bit longer

(2) if it's new enough, we need to read it a look for that particular
database - if it's not found, we have no info about it yet (this is
the case handled by the dummy files)

(3) if there's a database stat entry, we need to check the timestamp
when it was written for the last time - if it's too old, send an
inquiry and wait a bit longer

(4) well, we have a recent global file, it contains the database stat
entry and it's fresh enough - tadaa, we're done

At least that's my idea - I haven't tried to implement it yet.


I see a few pros and cons of this approach:

pros:

  * no dummy files
  * no timestamps in the per-db files (and thus no sync issues)

cons:

  * the backends / workers will have to re-read the global file just to
check that the per-db file is there and is fresh enough


So far it was sufficient just to peek at the timestamp at the beginning
of the per-db stat file - minimum data read, no CPU-expensive processing
etc. Sadly the more DBs there are, the larger the file get (thus more
overhead to read it).

OTOH it's not that much data (~180B per entry, so with a 1000 of dbs
it's just ~180kB) so I don't expect this to be a tremendous issue. And
the pros seem to be quite compelling.

Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-16 Thread Alvaro Herrera
Tomas Vondra wrote:

 I've been thinking about this (actually I had a really weird dream about
 it this night) and I think it might work like this:
 
 (1) check the timestamp of the global file - if it's too old, we need
 to send an inquiry or wait a bit longer
 
 (2) if it's new enough, we need to read it a look for that particular
 database - if it's not found, we have no info about it yet (this is
 the case handled by the dummy files)
 
 (3) if there's a database stat entry, we need to check the timestamp
 when it was written for the last time - if it's too old, send an
 inquiry and wait a bit longer
 
 (4) well, we have a recent global file, it contains the database stat
 entry and it's fresh enough - tadaa, we're done

Hmm, yes, I think this is what I was imagining.  I had even considered
that the timestamp would be removed from the per-db file as you suggest
here.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-15 Thread Alvaro Herrera
Tomas Vondra escribió:

 On 14.2.2013 20:23, Alvaro Herrera wrote:

  The problem here is that creating these dummy entries will cause a
  difference in autovacuum behavior.  Autovacuum will skip processing
  databases with no pgstat entry, and the intended reason is that if
  there's no pgstat entry it's because the database doesn't have enough
  activity.  Now perhaps we want to change that, but it should be an
  explicit decision taken after discussion and thought, not side effect
  from an unrelated patch.
 
 I don't see how that changes the autovacuum behavior. Can you explain
 that a bit more?
 
 As I see it, with the old (single-file version) the autovacuum worker
 would get exacly the same thing, i.e. no stats at all.

See in autovacuum.c the calls to pgstat_fetch_stat_dbentry().  Most of
them check for NULL result and act differently depending on that.
Returning a valid (not NULL) entry full of zeroes is not the same.
I didn't actually try to reproduce a problem.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-15 Thread Tomas Vondra
On 15.2.2013 16:38, Alvaro Herrera wrote:
 Tomas Vondra escribió:
 
 On 14.2.2013 20:23, Alvaro Herrera wrote:
 
 The problem here is that creating these dummy entries will cause a
 difference in autovacuum behavior.  Autovacuum will skip processing
 databases with no pgstat entry, and the intended reason is that if
 there's no pgstat entry it's because the database doesn't have enough
 activity.  Now perhaps we want to change that, but it should be an
 explicit decision taken after discussion and thought, not side effect
 from an unrelated patch.

 I don't see how that changes the autovacuum behavior. Can you explain
 that a bit more?

 As I see it, with the old (single-file version) the autovacuum worker
 would get exacly the same thing, i.e. no stats at all.
 
 See in autovacuum.c the calls to pgstat_fetch_stat_dbentry().  Most of
 them check for NULL result and act differently depending on that.
 Returning a valid (not NULL) entry full of zeroes is not the same.
 I didn't actually try to reproduce a problem.

E, but why would the patched code return entry full of zeroes and
not NULL as before? The dummy files serve single purpose - confirm that
the collector attempted to write info for the particular database (and
did not found any data for that).

All it contains is a timestamp of the write - nothing else. So the
worker will read the global file (containing list of stats for dbs) and
then will get NULL just like the old code. Because the database is not
there and the patch does not change that at all.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-14 Thread Alvaro Herrera
Alvaro Herrera escribió:

 Hm, and I now also realize another bug in this patch: the global stats
 only include database entries for requested databases; but perhaps the
 existing files can serve later requestors just fine for databases that
 already had files; so the global stats file should continue to carry
 entries for them, with the old timestamps.

Actually the code already do things that way -- apologies.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-14 Thread Alvaro Herrera
Here's a ninth version of this patch.  (version 8 went unpublished).  I
have simplified a lot of things and improved some comments; I think I
understand much of it now.  I think this patch is fairly close to
committable, but one issue remains, which is this bit in
pgstat_write_statsfiles():

/* In any case, we can just throw away all the db requests, but we need 
to
 * write dummy files for databases without a stat entry (it would cause
 * issues in pgstat_read_db_statsfile_timestamp and pgstat wait 
timeouts).
 * This may happen e.g. for shared DB (oid = 0) right after initdb.
 */
if (!slist_is_empty(last_statrequests))
{
slist_mutable_iter  iter;

slist_foreach_modify(iter, last_statrequests)
{
DBWriteRequest *req = slist_container(DBWriteRequest, 
next,

  iter.cur);

/*
 * Create dummy files for requested databases without a 
proper
 * dbentry. It's much easier this way than dealing with 
multiple
 * timestamps, possibly existing but not yet written 
DBs etc.
 * */
if (!pgstat_get_db_entry(req-databaseid, false))
pgstat_write_db_dummyfile(req-databaseid);

pfree(req);
}

slist_init(last_statrequests);
}

The problem here is that creating these dummy entries will cause a
difference in autovacuum behavior.  Autovacuum will skip processing
databases with no pgstat entry, and the intended reason is that if
there's no pgstat entry it's because the database doesn't have enough
activity.  Now perhaps we want to change that, but it should be an
explicit decision taken after discussion and thought, not side effect
from an unrelated patch.

Hm, and I now also realize another bug in this patch: the global stats
only include database entries for requested databases; but perhaps the
existing files can serve later requestors just fine for databases that
already had files; so the global stats file should continue to carry
entries for them, with the old timestamps.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
***
*** 38,43 
--- 38,44 
  #include access/xact.h
  #include catalog/pg_database.h
  #include catalog/pg_proc.h
+ #include lib/ilist.h
  #include libpq/ip.h
  #include libpq/libpq.h
  #include libpq/pqsignal.h
***
*** 66,73 
   * Paths for the statistics files (relative to installation's $PGDATA).
   * --
   */
! #define PGSTAT_STAT_PERMANENT_FILENAME		global/pgstat.stat
! #define PGSTAT_STAT_PERMANENT_TMPFILE		global/pgstat.tmp
  
  /* --
   * Timer definitions.
--- 67,75 
   * Paths for the statistics files (relative to installation's $PGDATA).
   * --
   */
! #define PGSTAT_STAT_PERMANENT_DIRECTORY		pg_stat
! #define PGSTAT_STAT_PERMANENT_FILENAME		pg_stat/global.stat
! #define PGSTAT_STAT_PERMANENT_TMPFILE		pg_stat/global.tmp
  
  /* --
   * Timer definitions.
***
*** 115,120  int			pgstat_track_activity_query_size = 1024;
--- 117,124 
   * Built from GUC parameter
   * --
   */
+ char	   *pgstat_stat_directory = NULL;
+ int			pgstat_stat_dbfile_maxlen = 0;
  char	   *pgstat_stat_filename = NULL;
  char	   *pgstat_stat_tmpname = NULL;
  
***
*** 219,229  static int	localNumBackends = 0;
   */
  static PgStat_GlobalStats globalStats;
  
! /* Last time the collector successfully wrote the stats file */
! static TimestampTz last_statwrite;
  
! /* Latest statistics request time from backends */
! static TimestampTz last_statrequest;
  
  static volatile bool need_exit = false;
  static volatile bool got_SIGHUP = false;
--- 223,238 
   */
  static PgStat_GlobalStats globalStats;
  
! /* Write request info for each database */
! typedef struct DBWriteRequest
! {
! 	Oid			databaseid;		/* OID of the database to write */
! 	TimestampTz request_time;	/* timestamp of the last write request */
! 	slist_node	next;
! } DBWriteRequest;
  
! /* Latest statistics request times from backends */
! static slist_head	last_statrequests = SLIST_STATIC_INIT(last_statrequests);
  
  static volatile bool need_exit = false;
  static volatile bool got_SIGHUP = false;
***
*** 252,262  static void pgstat_sighup_handler(SIGNAL_ARGS);
  static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
  static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
  	 Oid tableoid, bool create);
! static void pgstat_write_statsfile(bool permanent);
! static HTAB 

Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-14 Thread Josh Berkus
I saw discussion about this on this thread, but I'm not able to figure
out what the answer is: how does this work with moving the stats file,
for example to a RAMdisk?  Specifically, if the user sets
stats_temp_directory, does it continue to work the way it does now?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-14 Thread Alvaro Herrera
Josh Berkus wrote:
 I saw discussion about this on this thread, but I'm not able to figure
 out what the answer is: how does this work with moving the stats file,
 for example to a RAMdisk?  Specifically, if the user sets
 stats_temp_directory, does it continue to work the way it does now?

Of course.  You get more files than previously, but yes.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-14 Thread Alvaro Herrera
Alvaro Herrera escribió:
 Here's a ninth version of this patch.  (version 8 went unpublished).  I
 have simplified a lot of things and improved some comments; I think I
 understand much of it now.  I think this patch is fairly close to
 committable, but one issue remains, which is this bit in
 pgstat_write_statsfiles():
 
   /* In any case, we can just throw away all the db requests, but we need 
 to
* write dummy files for databases without a stat entry (it would cause
* issues in pgstat_read_db_statsfile_timestamp and pgstat wait 
 timeouts).
* This may happen e.g. for shared DB (oid = 0) right after initdb.
*/

I think the real way to handle this is to fix backend_read_statsfile().
It's using the old logic of considering existance of the file, but of
course now the file might not exist at all and that doesn't mean we need
to continue kicking the collector to write it.  We need a mechanism to
figure that the collector is just not going to write the file no matter
how hard we kick it ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-14 Thread Alvaro Herrera
Alvaro Herrera escribió:
 Here's a ninth version of this patch.  (version 8 went unpublished).  I
 have simplified a lot of things and improved some comments; I think I
 understand much of it now.  I think this patch is fairly close to
 committable, but one issue remains, which is this bit in
 pgstat_write_statsfiles():

I've marked this as Waiting on author for the time being.  I'm going to
review/work on other patches now, hoping that Tomas will post an updated
version in time for it to be considered for 9.3.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-14 Thread Tomas Vondra
On 14.2.2013 20:43, Josh Berkus wrote:
 I saw discussion about this on this thread, but I'm not able to figure
 out what the answer is: how does this work with moving the stats file,
 for example to a RAMdisk?  Specifically, if the user sets
 stats_temp_directory, does it continue to work the way it does now?

No change in this respect - you can still use RAMdisk, and you'll
actually need less space because the space requirements decreased due to
breaking the single file into multiple pieces.

We're using it this way (on a tmpfs filesystem) and it works like a charm.

regards
Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-14 Thread Tomas Vondra
First of all, big thanks for working on this patch and not only
identifying the issues but actually fixing them.

On 14.2.2013 20:23, Alvaro Herrera wrote:
 Here's a ninth version of this patch.  (version 8 went unpublished).  I
 have simplified a lot of things and improved some comments; I think I
 understand much of it now.  I think this patch is fairly close to
 committable, but one issue remains, which is this bit in
 pgstat_write_statsfiles():
 
...

 
 The problem here is that creating these dummy entries will cause a
 difference in autovacuum behavior.  Autovacuum will skip processing
 databases with no pgstat entry, and the intended reason is that if
 there's no pgstat entry it's because the database doesn't have enough
 activity.  Now perhaps we want to change that, but it should be an
 explicit decision taken after discussion and thought, not side effect
 from an unrelated patch.

I don't see how that changes the autovacuum behavior. Can you explain
that a bit more?

As I see it, with the old (single-file version) the autovacuum worker
would get exacly the same thing, i.e. no stats at all.

Which is exacly what autovacuum worker gets with the new code, except
that the check for last statfile timestamp uses the per-db file, so we
need to write it. This way the worker is able to read the timestamp, is
happy about it because it gets a fresh file although it gets no stats later.

Where is the behavior change? Can you provide an example?

kind regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-14 Thread Tomas Vondra
On 14.2.2013 22:24, Alvaro Herrera wrote:
 Alvaro Herrera escribió:
 Here's a ninth version of this patch.  (version 8 went unpublished).  I
 have simplified a lot of things and improved some comments; I think I
 understand much of it now.  I think this patch is fairly close to
 committable, but one issue remains, which is this bit in
 pgstat_write_statsfiles():
 
 I've marked this as Waiting on author for the time being.  I'm going to
 review/work on other patches now, hoping that Tomas will post an updated
 version in time for it to be considered for 9.3.

Sadly I have no idea how to fix that, and I think the solution you
suggested in the previous messages does not actually do the trick :-(

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-14 Thread Alvaro Herrera
Tomas Vondra escribió:

 I don't see how that changes the autovacuum behavior. Can you explain
 that a bit more?

It might be that I'm all wet on this.  I'll poke at it some more.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-13 Thread Alvaro Herrera
Here's an updated version of this patch that takes care of the issues I
reported previously: no more repalloc() of the requests array; it's now
an slist, which makes the code much more natural IMV.  And no more
messing around with doing sprintf to create a separate sprintf pattern
for the per-db stats file; instead have a function to return the name
that uses just the pgstat dir as stored by GUC.  I think this can be
further simplified still.

I haven't reviewed the rest yet; please do give this a try to confirm
that the speedups previously reported are still there (i.e. I didn't
completely blew it).

Thanks

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
***
*** 38,43 
--- 38,44 
  #include access/xact.h
  #include catalog/pg_database.h
  #include catalog/pg_proc.h
+ #include lib/ilist.h
  #include libpq/ip.h
  #include libpq/libpq.h
  #include libpq/pqsignal.h
***
*** 66,73 
   * Paths for the statistics files (relative to installation's $PGDATA).
   * --
   */
! #define PGSTAT_STAT_PERMANENT_FILENAME		global/pgstat.stat
! #define PGSTAT_STAT_PERMANENT_TMPFILE		global/pgstat.tmp
  
  /* --
   * Timer definitions.
--- 67,75 
   * Paths for the statistics files (relative to installation's $PGDATA).
   * --
   */
! #define PGSTAT_STAT_PERMANENT_DIRECTORY		pg_stat
! #define PGSTAT_STAT_PERMANENT_FILENAME		pg_stat/global.stat
! #define PGSTAT_STAT_PERMANENT_TMPFILE		pg_stat/global.tmp
  
  /* --
   * Timer definitions.
***
*** 115,120  int			pgstat_track_activity_query_size = 1024;
--- 117,123 
   * Built from GUC parameter
   * --
   */
+ char	   *pgstat_stat_directory = NULL;
  char	   *pgstat_stat_filename = NULL;
  char	   *pgstat_stat_tmpname = NULL;
  
***
*** 219,229  static int	localNumBackends = 0;
   */
  static PgStat_GlobalStats globalStats;
  
! /* Last time the collector successfully wrote the stats file */
! static TimestampTz last_statwrite;
  
! /* Latest statistics request time from backends */
! static TimestampTz last_statrequest;
  
  static volatile bool need_exit = false;
  static volatile bool got_SIGHUP = false;
--- 222,237 
   */
  static PgStat_GlobalStats globalStats;
  
! /* Write request info for each database */
! typedef struct DBWriteRequest
! {
! 	Oid			databaseid;		/* OID of the database to write */
! 	TimestampTz request_time;	/* timestamp of the last write request */
! 	slist_node	next;
! } DBWriteRequest;
  
! /* Latest statistics request time from backends for each DB */
! static slist_head	last_statrequests = SLIST_STATIC_INIT(last_statrequests);
  
  static volatile bool need_exit = false;
  static volatile bool got_SIGHUP = false;
***
*** 252,262  static void pgstat_sighup_handler(SIGNAL_ARGS);
  static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
  static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
  	 Oid tableoid, bool create);
! static void pgstat_write_statsfile(bool permanent);
! static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent);
  static void backend_read_statsfile(void);
  static void pgstat_read_current_status(void);
  
  static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
  static void pgstat_send_funcstats(void);
  static HTAB *pgstat_collect_oids(Oid catalogid);
--- 260,276 
  static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
  static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
  	 Oid tableoid, bool create);
! static void pgstat_write_statsfile(bool permanent, bool force);
! static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent);
! static void pgstat_write_db_dummyfile(Oid databaseid);
! static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs);
! static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
  static void backend_read_statsfile(void);
  static void pgstat_read_current_status(void);
  
+ static bool pgstat_write_statsfile_needed(void);
+ static bool pgstat_db_requested(Oid databaseid);
+ 
  static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
  static void pgstat_send_funcstats(void);
  static HTAB *pgstat_collect_oids(Oid catalogid);
***
*** 285,291  static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le
  static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
  static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
  
- 
  /* 
   * Public functions called from postmaster follow
   * 
--- 299,304 
***
*** 549,556  

Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-09 Thread Jeff Janes
On Tue, Feb 5, 2013 at 2:31 PM, Tomas Vondra t...@fuzzy.cz wrote:

 We do not already have this.  There is no relevant spec.  I can't see
 how this could need pg_dump support (but what about pg_upgrade?)

 pg_dump - no

 pg_upgrage - IMHO it should create the pg_stat directory. I don't think
 it could convert statfile into the new format (by splitting it into
 the pieces). I haven't checked but I believe the default behavior is to
 delete it as there might be new fields / slight changes of meaning etc.

Right, I have no concerns with pg_upgrade any more.  The pg_stat will
inherently get created by the initdb of the new cluster (because the
initdb will done with the new binaries with your patch in place them).

pg_upgrade currently doesn't copy over global/pgstat.stat.  So that
means the new cluster doesn't have the activity stats either way,
patch or unpatched.  So if it is not currently a problem it will not
become one under the proposed patch.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-06 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 Nice. Another interesting numbers would be device utilization, average
 I/O speed and required space (which should be ~2x the pgstat.stat size
 without the patch).

 this point is important - with large warehouse with lot of databases
 and tables you have move stat file to some ramdisc - without it you
 lost lot of IO capacity - and it is very important if you need only
 half sized ramdisc

[ blink... ]  I confess I'd not been paying close attention to this
thread, but if that's true I'd say the patch is DOA.  Why should we
accept 2x bloat in the already-far-too-large stats file?  I thought
the idea was just to split up the existing data into multiple files.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-06 Thread Alvaro Herrera
Tom Lane escribió:
 Pavel Stehule pavel.steh...@gmail.com writes:
  Nice. Another interesting numbers would be device utilization, average
  I/O speed and required space (which should be ~2x the pgstat.stat size
  without the patch).
 
  this point is important - with large warehouse with lot of databases
  and tables you have move stat file to some ramdisc - without it you
  lost lot of IO capacity - and it is very important if you need only
  half sized ramdisc
 
 [ blink... ]  I confess I'd not been paying close attention to this
 thread, but if that's true I'd say the patch is DOA.  Why should we
 accept 2x bloat in the already-far-too-large stats file?  I thought
 the idea was just to split up the existing data into multiple files.

I think they are saying just the opposite: maximum disk space
utilization is now half of the unpatched code.  This is because when we
need to write the temporary file to rename on top of the other one, the
temporary file is not of the size of the complete pgstat data collation,
but just that for the requested database.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-06 Thread Pavel Stehule
2013/2/6 Alvaro Herrera alvhe...@2ndquadrant.com:
 Tom Lane escribió:
 Pavel Stehule pavel.steh...@gmail.com writes:
  Nice. Another interesting numbers would be device utilization, average
  I/O speed and required space (which should be ~2x the pgstat.stat size
  without the patch).

  this point is important - with large warehouse with lot of databases
  and tables you have move stat file to some ramdisc - without it you
  lost lot of IO capacity - and it is very important if you need only
  half sized ramdisc

 [ blink... ]  I confess I'd not been paying close attention to this
 thread, but if that's true I'd say the patch is DOA.  Why should we
 accept 2x bloat in the already-far-too-large stats file?  I thought
 the idea was just to split up the existing data into multiple files.

 I think they are saying just the opposite: maximum disk space
 utilization is now half of the unpatched code.  This is because when we
 need to write the temporary file to rename on top of the other one, the
 temporary file is not of the size of the complete pgstat data collation,
 but just that for the requested database.

+1

Pavel


 --
 Álvaro Herrerahttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-06 Thread Tomas Vondra

Dne 06.02.2013 16:53, Alvaro Herrera napsal:

Tom Lane escribió:

Pavel Stehule pavel.steh...@gmail.com writes:
 Nice. Another interesting numbers would be device utilization, 
average
 I/O speed and required space (which should be ~2x the pgstat.stat 
size

 without the patch).

 this point is important - with large warehouse with lot of 
databases
 and tables you have move stat file to some ramdisc - without it 
you
 lost lot of IO capacity - and it is very important if you need 
only

 half sized ramdisc

[ blink... ]  I confess I'd not been paying close attention to this
thread, but if that's true I'd say the patch is DOA.  Why should we
accept 2x bloat in the already-far-too-large stats file?  I thought
the idea was just to split up the existing data into multiple files.


I think they are saying just the opposite: maximum disk space
utilization is now half of the unpatched code.  This is because when 
we
need to write the temporary file to rename on top of the other one, 
the
temporary file is not of the size of the complete pgstat data 
collation,

but just that for the requested database.


Exactly. And I suspect the current (unpatched) code ofter requires more 
than
twice the space because of open file descriptors to already deleted 
files.


Tomas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-06 Thread Jeff Janes
On Tue, Feb 5, 2013 at 2:31 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 5.2.2013 19:23, Jeff Janes wrote:

 If I shutdown the server and blow away the stats with rm
 data/pg_stat/*, it recovers gracefully when I start it back up.  If a
 do rm -r data/pg_stat then it has problems the next time I shut it
 down, but I have no right to do that in the first place.  If I initdb
 a database without this patch, then shut it down and restart with
 binaries that include this patch, and need to manually make the
 pg_stat directory.  Does that mean it needs a catalog bump in order to
 force an initdb?

 U, what you mean by catalog bump?

There is a catalog number in src/include/catalog/catversion.h, which
when changed forces one to redo initdb.

Formally I guess it is only for system catalog changes, but I thought
it was used for any on-disk changes during development cycles.  I like
it the way it is, as I can use the same  data directory for both
versions of the binary (patched and unpatched), and just manually
create or remove the directory pg_stat directory when changing modes.
That is ideal for testing this patch, probably not ideal for being
committed into the tree along with all the other ongoing devel work.
But I think this is something the committer has to worry about.


 I have a question about its completeness.  When I first start up the
 cluster and have not yet touched it, there is very little stats
 collector activity, either with or without this patch.  When I kick
 the cluster sufficiently (I've been using vacuumdb -a to do that) then
 there is a lot of stats collector activity.  Even once the vacuumdb
 has long finished, this high level of activity continues even though
 the database is otherwise completely idle, and this seems to happen
 for every.  This patch makes that high level of activity much more
 efficient, but it does not reduce the activity.  I don't understand
 why an idle database cannot get back into the state right after
 start-up.

 What do you mean by stats collector activity?  Is it reading/writing a
 lot of data, or is it just using a lot of CPU?

Basically, the launching of new autovac workers and the work that that
entails.  Your patch reduces the size of data that needs to be
written, read, and parsed for every launch, but not the number of
times that that happens.

 Isn't that just a natural and expected behavior because the database
 needs to actually perform ANALYZE to actually collect the data. Although
 the tables are empty, it costs some CPU / IO and there's a lot of them
 (1000 dbs, each with 200 tables).

It isn't touching the tables at all, just the stats files.

I was wrong about the cluster opening quiet.  It only does that if,
while the cluster was shutdown, you remove the statistics files which
I was doing, as I was switching back and forth between patched and
unpatched.

When the cluster opens, any databases that don't have statistics in
the stat file(s) will not get an autovacuum worker process spawned.
They only start getting spawned once someone asks for statistics for
that database.  But then once that happens, that database then gets a
worker spawned for it every naptime (or, at least, as close to that as
the server can keep up with) for eternity, even if that database is
never used again.  The only way to stop this is the unsupported way of
blowing away the permanent stats files.




 I don't think there's a way around this. You may increase the autovacuum
 naptime, but that's about all.

 I do not think that the patch needs to solve this problem in order to
 be accepted, but if it can be addressed while the author and reviewers
 are paying attention to this part of the system, that would be ideal.
 And if not, then we should at least remember that there is future work
 that could be done here.

 If I understand that correctly, you see the same behaviour even without
 the patch, right? In that case I'd vote not to make the patch more
 complex, and try to improve that separately (if it's even possible).

OK.  I just thought that while digging through the code, you might
have a good idea for fixing this part as well.  If so, it would be a
shame for that idea to be lost when you move on to other things.




 I created 1000 databases each with 200 single column tables (x serial
 primary key).

 After vacuumdb -a, I let it idle for a long time to see what steady
 state was reached.

 without the patch:
 vacuumdb -a   real11m2.624s
 idle steady state:  48.17% user, 39.24% system, 11.78% iowait, 0.81% idle.

 with the patch:
 vacuumdb -areal6m41.306s
 idle steady state:  7.86% user, 5.00% system  0.09% iowait  87% idle,

 Nice. Another interesting numbers would be device utilization, average
 I/O speed

I didn't gather that data, as I never figured out how to interpret
those numbers and so don't have much faith in them.  (But I am pretty
impressed with the numbers I do understand)

 and required space (which should be ~2x the pgstat.stat size
 without 

Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-06 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 On Tue, Feb 5, 2013 at 2:31 PM, Tomas Vondra t...@fuzzy.cz wrote:
 U, what you mean by catalog bump?

 There is a catalog number in src/include/catalog/catversion.h, which
 when changed forces one to redo initdb.

 Formally I guess it is only for system catalog changes, but I thought
 it was used for any on-disk changes during development cycles.

Yeah, it would be appropriate to bump the catversion if we're creating a
new PGDATA subdirectory.

I'm not excited about keeping code to take care of the lack of such a
subdirectory at runtime, as I gather there is in the current state of
the patch.  Formally, if there were such code, we'd not need a
catversion bump --- the rule of thumb is to change catversion if the new
postgres executable would fail regression tests without a run of the new
initdb.  But it's pretty dumb to keep such code indefinitely, when it
would have no more possible use after the next catversion bump (which is
seldom more than a week or two away during devel phase).

 What do you mean by stats collector activity?  Is it reading/writing a
 lot of data, or is it just using a lot of CPU?

 Basically, the launching of new autovac workers and the work that that
 entails.  Your patch reduces the size of data that needs to be
 written, read, and parsed for every launch, but not the number of
 times that that happens.

It doesn't seem very reasonable to ask this patch to redesign the
autovacuum algorithms, which is essentially what it'll take to improve
that.  That's a completely separate layer of code.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-06 Thread Tomas Vondra
On 7.2.2013 00:40, Tom Lane wrote:
 Jeff Janes jeff.ja...@gmail.com writes:
 On Tue, Feb 5, 2013 at 2:31 PM, Tomas Vondra t...@fuzzy.cz wrote:
 U, what you mean by catalog bump?
 
 There is a catalog number in src/include/catalog/catversion.h, which
 when changed forces one to redo initdb.
 
 Formally I guess it is only for system catalog changes, but I thought
 it was used for any on-disk changes during development cycles.
 
 Yeah, it would be appropriate to bump the catversion if we're creating a
 new PGDATA subdirectory.
 
 I'm not excited about keeping code to take care of the lack of such a
 subdirectory at runtime, as I gather there is in the current state of
 the patch.  Formally, if there were such code, we'd not need a

No, there is nothing to handle that at runtime. The directory is created
at initdb and the patch expects that (and fails if it's gone).

 catversion bump --- the rule of thumb is to change catversion if the new
 postgres executable would fail regression tests without a run of the new
 initdb.  But it's pretty dumb to keep such code indefinitely, when it
 would have no more possible use after the next catversion bump (which is
 seldom more than a week or two away during devel phase).
 
 What do you mean by stats collector activity?  Is it reading/writing a
 lot of data, or is it just using a lot of CPU?
 
 Basically, the launching of new autovac workers and the work that that
 entails.  Your patch reduces the size of data that needs to be
 written, read, and parsed for every launch, but not the number of
 times that that happens.
 
 It doesn't seem very reasonable to ask this patch to redesign the
 autovacuum algorithms, which is essentially what it'll take to improve
 that.  That's a completely separate layer of code.

My opinion, exactly.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-05 Thread Jeff Janes
On Sun, Feb 3, 2013 at 4:51 PM, Tomas Vondra t...@fuzzy.cz wrote:

 LOG:  last_statwrite 11133-08-28 19:22:31.711744+02 is later than
 collector's time 2013-02-04 00:54:21.113439+01 for db 19093
 WARNING:  pgstat wait timeout
 LOG:  last_statwrite 39681-12-23 18:48:48.9093+01 is later than
 collector's time 2013-02-04 00:54:31.424681+01 for db 46494
 FATAL:  could not find block containing chunk 0x2af4a60
 LOG:  statistics collector process (PID 10063) exited with exit code 1
 WARNING:  pgstat wait timeout
 WARNING:  pgstat wait timeout

 I'm not entirely sure where the FATAL came from, but it seems it was
 somehow related to the issue - it was quite reproducible, although I
 don't see how exactly could this happen. There relevant block of code
 looks like this:

 char   *writetime;
 char   *mytime;

 /* Copy because timestamptz_to_str returns a static buffer */
 writetime = pstrdup(timestamptz_to_str(dbentry-stats_timestamp));
 mytime = pstrdup(timestamptz_to_str(cur_ts));
 elog(LOG, last_statwrite %s is later than collector's time %s for 
 db %d, writetime, mytime, dbentry-databaseid);
 pfree(writetime);
 pfree(mytime);

 which seems quite fine to mee. I'm not sure how one of the pfree calls
 could fail?

I don't recall seeing the FATAL errors myself, but didn't keep the
logfile around.  (I do recall seeing the pgstat wait timeout).

Are you using windows? pstrdup seems to be different there.

I'm afraid I don't have much to say on the code.  Indeed I never even
look at it (other than grepping for pstrdup just now).  I am taking a
purely experimental approach, Since Alvaro and others have looked at
the code.


 Anyway, attached is a patch that fixes all three issues, i.e.

 1) the un-initialized timestamp
 2) the void ommited from the signature
 3) rename to pg_stat instead of just stat

Thanks.

If I shutdown the server and blow away the stats with rm
data/pg_stat/*, it recovers gracefully when I start it back up.  If a
do rm -r data/pg_stat then it has problems the next time I shut it
down, but I have no right to do that in the first place.  If I initdb
a database without this patch, then shut it down and restart with
binaries that include this patch, and need to manually make the
pg_stat directory.  Does that mean it needs a catalog bump in order to
force an initdb?

A review:

It applies cleanly (some offsets, no fuzz), builds without warnings,
and passes make check including with cassert.

The final test done in make check inherently tests this code, and it
passes.  If I intentionally break the patch by making
pgstat_read_db_statsfile add one to the oid it opens, then the test
fails.  So the existing test is at least plausible as a test.

doc/src/sgml/monitoring.sgml needs to be changed: a permanent copy of
the statistics data is stored in the global subdirectory.  I'm not
aware of any other needed changes to the docs.

The big question is whether we want this.  I think we do.  While
having hundreds of databases in a cluster is not recommended, that is
no reason not to handle it better than we do.  I don't see any
down-sides, other than possibly some code uglification.  Some file
systems might not deal well with having lots of small stats files
being rapidly written and rewritten, but it is hard to see how the
current behavior would be more favorable for those systems.

We do not already have this.  There is no relevant spec.  I can't see
how this could need pg_dump support (but what about pg_upgrade?)

I am not aware of any dangers.

I have a question about its completeness.  When I first start up the
cluster and have not yet touched it, there is very little stats
collector activity, either with or without this patch.  When I kick
the cluster sufficiently (I've been using vacuumdb -a to do that) then
there is a lot of stats collector activity.  Even once the vacuumdb
has long finished, this high level of activity continues even though
the database is otherwise completely idle, and this seems to happen
for every.  This patch makes that high level of activity much more
efficient, but it does not reduce the activity.  I don't understand
why an idle database cannot get back into the state right after
start-up.

I do not think that the patch needs to solve this problem in order to
be accepted, but if it can be addressed while the author and reviewers
are paying attention to this part of the system, that would be ideal.
And if not, then we should at least remember that there is future work
that could be done here.

I created 1000 databases each with 200 single column tables (x serial
primary key).

After vacuumdb -a, I let it idle for a long time to see what steady
state was reached.

without the patch:
vacuumdb -a   real11m2.624s
idle steady state:  48.17% user, 39.24% system, 11.78% iowait, 0.81% idle.

with the patch:
vacuumdb -areal6m41.306s
idle steady state:  7.86% user, 5.00% system  0.09% iowait  87% idle,


I also ran pgbench on a 

Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-05 Thread Tomas Vondra
On 5.2.2013 19:23, Jeff Janes wrote:
 On Sun, Feb 3, 2013 at 4:51 PM, Tomas Vondra t...@fuzzy.cz wrote:
 
 LOG:  last_statwrite 11133-08-28 19:22:31.711744+02 is later than
 collector's time 2013-02-04 00:54:21.113439+01 for db 19093
 WARNING:  pgstat wait timeout
 LOG:  last_statwrite 39681-12-23 18:48:48.9093+01 is later than
 collector's time 2013-02-04 00:54:31.424681+01 for db 46494
 FATAL:  could not find block containing chunk 0x2af4a60
 LOG:  statistics collector process (PID 10063) exited with exit code 1
 WARNING:  pgstat wait timeout
 WARNING:  pgstat wait timeout

 I'm not entirely sure where the FATAL came from, but it seems it was
 somehow related to the issue - it was quite reproducible, although I
 don't see how exactly could this happen. There relevant block of code
 looks like this:

 char   *writetime;
 char   *mytime;

 /* Copy because timestamptz_to_str returns a static buffer */
 writetime = pstrdup(timestamptz_to_str(dbentry-stats_timestamp));
 mytime = pstrdup(timestamptz_to_str(cur_ts));
 elog(LOG, last_statwrite %s is later than collector's time %s for 
 db %d, writetime, mytime, dbentry-databaseid);
 pfree(writetime);
 pfree(mytime);

 which seems quite fine to mee. I'm not sure how one of the pfree calls
 could fail?
 
 I don't recall seeing the FATAL errors myself, but didn't keep the
 logfile around.  (I do recall seeing the pgstat wait timeout).
 
 Are you using windows? pstrdup seems to be different there.

Nope. I'll repeat the test with the original patch to find out what went
wrong, just to be sure it was fixed.

 I'm afraid I don't have much to say on the code.  Indeed I never even
 look at it (other than grepping for pstrdup just now).  I am taking a
 purely experimental approach, Since Alvaro and others have looked at
 the code.

Thanks for finding the issue with unitialized timestamp!

 If I shutdown the server and blow away the stats with rm
 data/pg_stat/*, it recovers gracefully when I start it back up.  If a
 do rm -r data/pg_stat then it has problems the next time I shut it
 down, but I have no right to do that in the first place.  If I initdb
 a database without this patch, then shut it down and restart with
 binaries that include this patch, and need to manually make the
 pg_stat directory.  Does that mean it needs a catalog bump in order to
 force an initdb?

U, what you mean by catalog bump?

Anyway, messing with files in the base directory is a bad idea in
general, and I don't think that's a reason to treat the pg_stat
directory differently. If you remove it by hand, you'll be rightfully
punished by various errors.

 A review:
 
 It applies cleanly (some offsets, no fuzz), builds without warnings,
 and passes make check including with cassert.
 
 The final test done in make check inherently tests this code, and it
 passes.  If I intentionally break the patch by making
 pgstat_read_db_statsfile add one to the oid it opens, then the test
 fails.  So the existing test is at least plausible as a test.
 
 doc/src/sgml/monitoring.sgml needs to be changed: a permanent copy of
 the statistics data is stored in the global subdirectory.  I'm not
 aware of any other needed changes to the docs.

Yeah, that should be in the global/pg_stat subdirectory.

 The big question is whether we want this.  I think we do.  While
 having hundreds of databases in a cluster is not recommended, that is
 no reason not to handle it better than we do.  I don't see any
 down-sides, other than possibly some code uglification.  Some file
 systems might not deal well with having lots of small stats files
 being rapidly written and rewritten, but it is hard to see how the
 current behavior would be more favorable for those systems.

If the filesystem has issues with that many entries, it's already hosed
by contents of the base directory (one per directory) or in the
database directories (multiple files per table).

Moreover, it's still possible to use tmpfs to handle this at runtime
(which is often the recommended solution with the current code), and use
the actual filesystem only for keeping the data across restarts.

 We do not already have this.  There is no relevant spec.  I can't see
 how this could need pg_dump support (but what about pg_upgrade?)

pg_dump - no

pg_upgrage - IMHO it should create the pg_stat directory. I don't think
it could convert statfile into the new format (by splitting it into
the pieces). I haven't checked but I believe the default behavior is to
delete it as there might be new fields / slight changes of meaning etc.

 I am not aware of any dangers.
 
 I have a question about its completeness.  When I first start up the
 cluster and have not yet touched it, there is very little stats
 collector activity, either with or without this patch.  When I kick
 the cluster sufficiently (I've been using vacuumdb -a to do that) then
 there is a lot of stats collector activity.  Even once the vacuumdb
 has long finished, 

Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-05 Thread Pavel Stehule

 with the patch:
 vacuumdb -areal6m41.306s
 idle steady state:  7.86% user, 5.00% system  0.09% iowait  87% idle,

 Nice. Another interesting numbers would be device utilization, average
 I/O speed and required space (which should be ~2x the pgstat.stat size
 without the patch).


this point is important - with large warehouse with lot of databases
and tables you have move stat file to some ramdisc - without it you
lost lot of IO capacity - and it is very important if you need only
half sized ramdisc

Regards

Pavel


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-03 Thread Jeff Janes
On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 3.1.2013 20:33, Magnus Hagander wrote:

 Yeah, +1 for a separate directory not in global.

 OK, I moved the files from global/stat to stat.

Why stat rather than pg_stat?

The existence of global and base as exceptions already annoys me.
(Especially when I do a tar -xf in my home directory without
remembering the -C flag).  Unless there is some unstated rule behind
what gets a pg_ and what doesn't, I think we should have the pg_.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-03 Thread Jeff Janes
On Sat, Feb 2, 2013 at 2:33 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 3.1.2013 20:33, Magnus Hagander wrote:

 Yeah, +1 for a separate directory not in global.

 OK, I moved the files from global/stat to stat.

 This has a warning:

 pgstat.c:5132: warning: 'pgstat_write_statsfile_needed' was used with
 no prototype before its definition

 I plan to do some performance testing, but that will take a while so I
 wanted to post this before I get distracted.

Running vacuumdb -a on a cluster with 1000 db with 200 tables (x
serial primary key) in each, I get log messages like this:

last_statwrite 23682-06-18 22:36:52.960194-07 is later than
collector's time 2013-02-03 12:49:19.700629-08 for db 16387

Note the bizarre year in the first time stamp.

If it matters, I got this after shutting down the cluster, blowing
away $DATA/stat/*, then restarting it and invoking vacuumdb.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-03 Thread Tomas Vondra
On 3.2.2013 20:46, Jeff Janes wrote:
 On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 3.1.2013 20:33, Magnus Hagander wrote:

 Yeah, +1 for a separate directory not in global.

 OK, I moved the files from global/stat to stat.
 
 Why stat rather than pg_stat?
 
 The existence of global and base as exceptions already annoys me.
 (Especially when I do a tar -xf in my home directory without
 remembering the -C flag).  Unless there is some unstated rule behind
 what gets a pg_ and what doesn't, I think we should have the pg_.

I don't think there's a clear naming rule. But I think your suggestion
makes perfect sense, especially because we have pg_stat_tmp directory.
So now we'd have pg_stat and pg_stat_tmp, which is quite elegant.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-03 Thread Tomas Vondra
On 2.2.2013 23:33, Jeff Janes wrote:
 On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 3.1.2013 20:33, Magnus Hagander wrote:

 Yeah, +1 for a separate directory not in global.

 OK, I moved the files from global/stat to stat.
 
 This has a warning:
 
 pgstat.c:5132: warning: 'pgstat_write_statsfile_needed' was used with
 no prototype before its definition

I forgot to add void into the method prototype ... Thanks!

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-03 Thread Tomas Vondra
On 3.2.2013 21:54, Jeff Janes wrote:
 On Sat, Feb 2, 2013 at 2:33 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 3.1.2013 20:33, Magnus Hagander wrote:

 Yeah, +1 for a separate directory not in global.

 OK, I moved the files from global/stat to stat.

 This has a warning:

 pgstat.c:5132: warning: 'pgstat_write_statsfile_needed' was used with
 no prototype before its definition

 I plan to do some performance testing, but that will take a while so I
 wanted to post this before I get distracted.
 
 Running vacuumdb -a on a cluster with 1000 db with 200 tables (x
 serial primary key) in each, I get log messages like this:
 
 last_statwrite 23682-06-18 22:36:52.960194-07 is later than
 collector's time 2013-02-03 12:49:19.700629-08 for db 16387
 
 Note the bizarre year in the first time stamp.
 
 If it matters, I got this after shutting down the cluster, blowing
 away $DATA/stat/*, then restarting it and invoking vacuumdb.

I somehow expected that hash_search zeroes all the fields of a new
entry, but looking at pgstat_get_db_entry that obviously is not the
case. So stats_timestamp (which tracks timestamp of the last write for a
DB) was random - that's where the bizarre year values came from.

I've added a proper initialization (to 0), and now it works as expected.

Although the whole sequence of errors I was getting was this:

LOG:  last_statwrite 11133-08-28 19:22:31.711744+02 is later than
collector's time 2013-02-04 00:54:21.113439+01 for db 19093
WARNING:  pgstat wait timeout
LOG:  last_statwrite 39681-12-23 18:48:48.9093+01 is later than
collector's time 2013-02-04 00:54:31.424681+01 for db 46494
FATAL:  could not find block containing chunk 0x2af4a60
LOG:  statistics collector process (PID 10063) exited with exit code 1
WARNING:  pgstat wait timeout
WARNING:  pgstat wait timeout

I'm not entirely sure where the FATAL came from, but it seems it was
somehow related to the issue - it was quite reproducible, although I
don't see how exactly could this happen. There relevant block of code
looks like this:

char   *writetime;
char   *mytime;

/* Copy because timestamptz_to_str returns a static buffer */
writetime = pstrdup(timestamptz_to_str(dbentry-stats_timestamp));
mytime = pstrdup(timestamptz_to_str(cur_ts));
elog(LOG, last_statwrite %s is later than collector's time %s for 
db %d, writetime, mytime, dbentry-databaseid);
pfree(writetime);
pfree(mytime);

which seems quite fine to mee. I'm not sure how one of the pfree calls
could fail?

Anyway, attached is a patch that fixes all three issues, i.e.

1) the un-initialized timestamp
2) the void ommited from the signature
3) rename to pg_stat instead of just stat


Tomas
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index d318db9..6d0efe9 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -64,10 +64,14 @@
 
 /* --
  * Paths for the statistics files (relative to installation's $PGDATA).
+ * Permanent and temprorary, global and per-database files.
  * --
  */
-#define PGSTAT_STAT_PERMANENT_FILENAME		global/pgstat.stat
-#define PGSTAT_STAT_PERMANENT_TMPFILE		global/pgstat.tmp
+#define PGSTAT_STAT_PERMANENT_DIRECTORY		pg_stat
+#define PGSTAT_STAT_PERMANENT_FILENAME		pg_stat/global.stat
+#define PGSTAT_STAT_PERMANENT_TMPFILE		pg_stat/global.tmp
+#define PGSTAT_STAT_PERMANENT_DB_FILENAME	pg_stat/%d.stat
+#define PGSTAT_STAT_PERMANENT_DB_TMPFILE	pg_stat/%d.tmp
 
 /* --
  * Timer definitions.
@@ -115,8 +119,11 @@ int			pgstat_track_activity_query_size = 1024;
  * Built from GUC parameter
  * --
  */
+char	   *pgstat_stat_directory = NULL;
 char	   *pgstat_stat_filename = NULL;
 char	   *pgstat_stat_tmpname = NULL;
+char	   *pgstat_stat_db_filename = NULL;
+char	   *pgstat_stat_db_tmpname = NULL;
 
 /*
  * BgWriter global statistics counters (unused in other processes).
@@ -219,11 +226,16 @@ static int	localNumBackends = 0;
  */
 static PgStat_GlobalStats globalStats;
 
-/* Last time the collector successfully wrote the stats file */
-static TimestampTz last_statwrite;
+/* Write request info for each database */
+typedef struct DBWriteRequest
+{
+	Oid			databaseid;		/* OID of the database to write */
+	TimestampTz request_time;	/* timestamp of the last write request */
+} DBWriteRequest;
 
-/* Latest statistics request time from backends */
-static TimestampTz last_statrequest;
+/* Latest statistics request time from backends for each DB */
+static DBWriteRequest * last_statrequests = NULL;
+static int num_statrequests = 0;
 
 static volatile bool need_exit = false;
 static volatile bool got_SIGHUP = false;
@@ -252,11 +264,17 @@ static void pgstat_sighup_handler(SIGNAL_ARGS);
 static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
 static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
 	 Oid tableoid, 

Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-03 Thread Tomas Vondra
On 1.2.2013 17:19, Alvaro Herrera wrote:
 Tomas Vondra wrote:
 
 diff --git a/src/backend/postmaster/pgstat.c 
 b/src/backend/postmaster/pgstat.c
 index be3adf1..4ec485e 100644
 --- a/src/backend/postmaster/pgstat.c
 +++ b/src/backend/postmaster/pgstat.c
 @@ -64,10 +64,14 @@
  
  /* --
   * Paths for the statistics files (relative to installation's $PGDATA).
 + * Permanent and temprorary, global and per-database files.
 
 Note typo in the line above.
 
 -#define PGSTAT_STAT_PERMANENT_FILENAME  global/pgstat.stat
 -#define PGSTAT_STAT_PERMANENT_TMPFILE   global/pgstat.tmp
 +#define PGSTAT_STAT_PERMANENT_DIRECTORY stat
 +#define PGSTAT_STAT_PERMANENT_FILENAME  stat/global.stat
 +#define PGSTAT_STAT_PERMANENT_TMPFILE   stat/global.tmp
 +#define PGSTAT_STAT_PERMANENT_DB_FILENAME   stat/%d.stat
 +#define PGSTAT_STAT_PERMANENT_DB_TMPFILEstat/%d.tmp
 
 +char   *pgstat_stat_directory = NULL;
  char   *pgstat_stat_filename = NULL;
  char   *pgstat_stat_tmpname = NULL;
 +char   *pgstat_stat_db_filename = NULL;
 +char   *pgstat_stat_db_tmpname = NULL;
 
 I don't like the quoted parts very much; it seems awkward to have the
 snprintf patterns in one place and have them be used in very distant

I don't see that as particularly awkward, but that's a matter of taste.
I still see that as a bunch of constants that are sprintf patterns at
the same time.

 places.  Is there a way to improve that?  Also, if I understand clearly,
 the pgstat_stat_db_filename value needs to be an snprintf pattern too,
 right?  What if it doesn't contain the required % specifier?

U, yes - it needs to be a pattern too, but the user specifies the
directory (stats_temp_directory) and this is used to derive all the
other values - see assign_pgstat_temp_directory() in guc.c.

 Also, if you can filter this through pgindent, that would be best.  Make
 sure to add DBWriteRequest to src/tools/pgindent/typedefs_list.

Will do.

 
 +/*
 + * There's no request for this DB yet, so lets create it 
 (allocate a
 + * space for it, set the values).
 + */
 +if (last_statrequests == NULL)
 +last_statrequests = palloc(sizeof(DBWriteRequest));
 +else
 +last_statrequests = repalloc(last_statrequests,
 +
 (num_statrequests + 1)*sizeof(DBWriteRequest));
 +
 +last_statrequests[num_statrequests].databaseid = 
 msg-databaseid;
 +last_statrequests[num_statrequests].request_time = 
 msg-clock_time;
 +num_statrequests += 1;
 
 Having to repalloc this array each time seems wrong.  Would a list
 instead of an array help?  see ilist.c/h; I vote for a dlist because you
 can easily delete elements from the middle of it, if required (I think
 you'd need that.)

Thanks. I'm not very familiar with the list interface, so I've used
plain array. But yes, there are better ways than doing repalloc all the
time.

 
 +char db_statfile[strlen(pgstat_stat_db_filename) + 11];
 +snprintf(db_statfile, strlen(pgstat_stat_db_filename) + 11,
 + pgstat_stat_filename, dbentry-databaseid);
 
 This pattern seems rather frequent.  Can we use a macro or similar here?
 Encapsulating the 11 better would be good.  Magic numbers are evil.

Yes, this needs to be cleaned / improved.

 diff --git a/src/include/pgstat.h b/src/include/pgstat.h
 index 613c1c2..b3467d2 100644
 --- a/src/include/pgstat.h
 +++ b/src/include/pgstat.h
 @@ -205,6 +205,7 @@ typedef struct PgStat_MsgInquiry
  PgStat_MsgHdr m_hdr;
  TimestampTz clock_time; /* observed local clock time */
  TimestampTz cutoff_time;/* minimum acceptable file timestamp */
 +Oid databaseid; /* requested DB 
 (InvalidOid = all DBs) */
  } PgStat_MsgInquiry;
 
 Do we need to support the case that somebody requests stuff from the
 shared DB?  IIRC that's what InvalidOid means in pgstat ...

Frankly, I don't know, but I guess we do because it was in the original
code, and there are such inquiries right after the database starts
(that's why I had to add pgstat_write_db_dummyfile).

Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-02 Thread Jeff Janes
On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 3.1.2013 20:33, Magnus Hagander wrote:

 Yeah, +1 for a separate directory not in global.

 OK, I moved the files from global/stat to stat.

This has a warning:

pgstat.c:5132: warning: 'pgstat_write_statsfile_needed' was used with
no prototype before its definition

I plan to do some performance testing, but that will take a while so I
wanted to post this before I get distracted.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-02 Thread Tomáš Vondra
Ok, thanks for the info. I'll look into that and I'll also post info from some 
of our production systems (we've deployed a 9.1-backpatched version about 2 
weeks ago).

T.

 Původní zpráva 
Od: Jeff Janes jeff.ja...@gmail.com 
Datum:  
Komu: Tomas Vondra t...@fuzzy.cz 
Kopie: pgsql-hackers@postgresql.org 
Předmět: Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum
  stress-testing our system 
 
On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 3.1.2013 20:33, Magnus Hagander wrote:

 Yeah, +1 for a separate directory not in global.

 OK, I moved the files from global/stat to stat.

This has a warning:

pgstat.c:5132: warning: 'pgstat_write_statsfile_needed' was used with
no prototype before its definition

I plan to do some performance testing, but that will take a while so I
wanted to post this before I get distracted.

Cheers,

Jeff


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-01 Thread Alvaro Herrera
Tomas Vondra wrote:

 diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
 index be3adf1..4ec485e 100644
 --- a/src/backend/postmaster/pgstat.c
 +++ b/src/backend/postmaster/pgstat.c
 @@ -64,10 +64,14 @@
  
  /* --
   * Paths for the statistics files (relative to installation's $PGDATA).
 + * Permanent and temprorary, global and per-database files.

Note typo in the line above.

 -#define PGSTAT_STAT_PERMANENT_FILENAME   global/pgstat.stat
 -#define PGSTAT_STAT_PERMANENT_TMPFILEglobal/pgstat.tmp
 +#define PGSTAT_STAT_PERMANENT_DIRECTORY  stat
 +#define PGSTAT_STAT_PERMANENT_FILENAME   stat/global.stat
 +#define PGSTAT_STAT_PERMANENT_TMPFILEstat/global.tmp
 +#define PGSTAT_STAT_PERMANENT_DB_FILENAMEstat/%d.stat
 +#define PGSTAT_STAT_PERMANENT_DB_TMPFILE stat/%d.tmp

 +char*pgstat_stat_directory = NULL;
  char*pgstat_stat_filename = NULL;
  char*pgstat_stat_tmpname = NULL;
 +char*pgstat_stat_db_filename = NULL;
 +char*pgstat_stat_db_tmpname = NULL;

I don't like the quoted parts very much; it seems awkward to have the
snprintf patterns in one place and have them be used in very distant
places.  Is there a way to improve that?  Also, if I understand clearly,
the pgstat_stat_db_filename value needs to be an snprintf pattern too,
right?  What if it doesn't contain the required % specifier?

Also, if you can filter this through pgindent, that would be best.  Make
sure to add DBWriteRequest to src/tools/pgindent/typedefs_list.

 + /*
 +  * There's no request for this DB yet, so lets create it 
 (allocate a
 +  * space for it, set the values).
 +  */
 + if (last_statrequests == NULL)
 + last_statrequests = palloc(sizeof(DBWriteRequest));
 + else
 + last_statrequests = repalloc(last_statrequests,
 + 
 (num_statrequests + 1)*sizeof(DBWriteRequest));
 + 
 + last_statrequests[num_statrequests].databaseid = 
 msg-databaseid;
 + last_statrequests[num_statrequests].request_time = 
 msg-clock_time;
 + num_statrequests += 1;

Having to repalloc this array each time seems wrong.  Would a list
instead of an array help?  see ilist.c/h; I vote for a dlist because you
can easily delete elements from the middle of it, if required (I think
you'd need that.)

 + char db_statfile[strlen(pgstat_stat_db_filename) + 11];
 + snprintf(db_statfile, strlen(pgstat_stat_db_filename) + 11,
 +  pgstat_stat_filename, dbentry-databaseid);

This pattern seems rather frequent.  Can we use a macro or similar here?
Encapsulating the 11 better would be good.  Magic numbers are evil.

 diff --git a/src/include/pgstat.h b/src/include/pgstat.h
 index 613c1c2..b3467d2 100644
 --- a/src/include/pgstat.h
 +++ b/src/include/pgstat.h
 @@ -205,6 +205,7 @@ typedef struct PgStat_MsgInquiry
   PgStat_MsgHdr m_hdr;
   TimestampTz clock_time; /* observed local clock time */
   TimestampTz cutoff_time;/* minimum acceptable file timestamp */
 + Oid databaseid; /* requested DB 
 (InvalidOid = all DBs) */
  } PgStat_MsgInquiry;

Do we need to support the case that somebody requests stuff from the
shared DB?  IIRC that's what InvalidOid means in pgstat ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-01-05 Thread Tomas Vondra
On 3.1.2013 20:33, Magnus Hagander wrote:
 On Thu, Jan 3, 2013 at 8:31 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 3.1.2013 18:47, Heikki Linnakangas wrote:
 How about creating the new directory as a direct subdir of $PGDATA,
 rather than buried in global? global is supposed to contain data
 related to shared catalog relations (plus pg_control), so it doesn't
 seem like the right location for per-database stat files. Also, if we're
 going to have admins manually zapping the directory (hopefully when the
 system is offline), that's less scary if the directory is not buried as
 deep.

 That's clearly possible and it's a trivial change. I was thinking about
 that actually, but then I placed the directory into global because
 that's where the pgstat.stat originally was.
 
 Yeah, +1 for a separate directory not in global.

OK, I moved the files from global/stat to stat.

Tomas
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index be3adf1..4ec485e 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -64,10 +64,14 @@
 
 /* --
  * Paths for the statistics files (relative to installation's $PGDATA).
+ * Permanent and temprorary, global and per-database files.
  * --
  */
-#define PGSTAT_STAT_PERMANENT_FILENAME global/pgstat.stat
-#define PGSTAT_STAT_PERMANENT_TMPFILE  global/pgstat.tmp
+#define PGSTAT_STAT_PERMANENT_DIRECTORYstat
+#define PGSTAT_STAT_PERMANENT_FILENAME stat/global.stat
+#define PGSTAT_STAT_PERMANENT_TMPFILE  stat/global.tmp
+#define PGSTAT_STAT_PERMANENT_DB_FILENAME  stat/%d.stat
+#define PGSTAT_STAT_PERMANENT_DB_TMPFILE   stat/%d.tmp
 
 /* --
  * Timer definitions.
@@ -115,8 +119,11 @@ int
pgstat_track_activity_query_size = 1024;
  * Built from GUC parameter
  * --
  */
+char  *pgstat_stat_directory = NULL;
 char  *pgstat_stat_filename = NULL;
 char  *pgstat_stat_tmpname = NULL;
+char  *pgstat_stat_db_filename = NULL;
+char  *pgstat_stat_db_tmpname = NULL;
 
 /*
  * BgWriter global statistics counters (unused in other processes).
@@ -219,11 +226,16 @@ static intlocalNumBackends = 0;
  */
 static PgStat_GlobalStats globalStats;
 
-/* Last time the collector successfully wrote the stats file */
-static TimestampTz last_statwrite;
+/* Write request info for each database */
+typedef struct DBWriteRequest
+{
+   Oid databaseid; /* OID of the database 
to write */
+   TimestampTz request_time;   /* timestamp of the last write request 
*/
+} DBWriteRequest;
 
-/* Latest statistics request time from backends */
-static TimestampTz last_statrequest;
+/* Latest statistics request time from backends for each DB */
+static DBWriteRequest * last_statrequests = NULL;
+static int num_statrequests = 0;
 
 static volatile bool need_exit = false;
 static volatile bool got_SIGHUP = false;
@@ -252,11 +264,17 @@ static void pgstat_sighup_handler(SIGNAL_ARGS);
 static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
 static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
 Oid tableoid, bool create);
-static void pgstat_write_statsfile(bool permanent);
-static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent);
+static void pgstat_write_statsfile(bool permanent, bool force);
+static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool 
permanent);
+static void pgstat_write_db_dummyfile(Oid databaseid);
+static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs);
+static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB 
*funchash, bool permanent);
 static void backend_read_statsfile(void);
 static void pgstat_read_current_status(void);
 
+static bool pgstat_write_statsfile_needed();
+static bool pgstat_db_requested(Oid databaseid);
+
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
 static HTAB *pgstat_collect_oids(Oid catalogid);
@@ -285,7 +303,6 @@ static void 
pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le
 static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
 static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
 
-
 /* 
  * Public functions called from postmaster follow
  * 
@@ -549,8 +566,34 @@ startup_failed:
 void
 pgstat_reset_all(void)
 {
-   unlink(pgstat_stat_filename);
-   unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+   DIR * dir;
+   struct dirent * entry;
+
+   dir = AllocateDir(pgstat_stat_directory);
+   while ((entry = ReadDir(dir, pgstat_stat_directory)) != NULL)
+   {
+   char fname[strlen(pgstat_stat_directory) + 
strlen(entry-d_name) + 1];
+
+   if 

Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-01-03 Thread Heikki Linnakangas

On 03.01.2013 01:15, Tomas Vondra wrote:

2) a new global/stat directory
--

The pgstat.stat file was originally saved into the global directory,
but with so many files that would get rather messy so I've created a new
global/stat directory and all the files are stored there.

This also means we can do a simple delete files in the dir when
pgstat_reset_all is called.


How about creating the new directory as a direct subdir of $PGDATA, 
rather than buried in global? global is supposed to contain data 
related to shared catalog relations (plus pg_control), so it doesn't 
seem like the right location for per-database stat files. Also, if we're 
going to have admins manually zapping the directory (hopefully when the 
system is offline), that's less scary if the directory is not buried as 
deep.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-01-03 Thread Tomas Vondra
On 3.1.2013 18:47, Heikki Linnakangas wrote:
 On 03.01.2013 01:15, Tomas Vondra wrote:
 2) a new global/stat directory
 --

 The pgstat.stat file was originally saved into the global directory,
 but with so many files that would get rather messy so I've created a new
 global/stat directory and all the files are stored there.

 This also means we can do a simple delete files in the dir when
 pgstat_reset_all is called.
 
 How about creating the new directory as a direct subdir of $PGDATA,
 rather than buried in global? global is supposed to contain data
 related to shared catalog relations (plus pg_control), so it doesn't
 seem like the right location for per-database stat files. Also, if we're
 going to have admins manually zapping the directory (hopefully when the
 system is offline), that's less scary if the directory is not buried as
 deep.

That's clearly possible and it's a trivial change. I was thinking about
that actually, but then I placed the directory into global because
that's where the pgstat.stat originally was.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-01-03 Thread Magnus Hagander
On Thu, Jan 3, 2013 at 8:31 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 3.1.2013 18:47, Heikki Linnakangas wrote:
 On 03.01.2013 01:15, Tomas Vondra wrote:
 2) a new global/stat directory
 --

 The pgstat.stat file was originally saved into the global directory,
 but with so many files that would get rather messy so I've created a new
 global/stat directory and all the files are stored there.

 This also means we can do a simple delete files in the dir when
 pgstat_reset_all is called.

 How about creating the new directory as a direct subdir of $PGDATA,
 rather than buried in global? global is supposed to contain data
 related to shared catalog relations (plus pg_control), so it doesn't
 seem like the right location for per-database stat files. Also, if we're
 going to have admins manually zapping the directory (hopefully when the
 system is offline), that's less scary if the directory is not buried as
 deep.

 That's clearly possible and it's a trivial change. I was thinking about
 that actually, but then I placed the directory into global because
 that's where the pgstat.stat originally was.

Yeah, +1 for a separate directory not in global.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers