Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Jeff Janes jeff.ja...@gmail.com wrote: Alvaro Herrera alvhe...@2ndquadrant.com wrote: So here's v11. I intend to commit this shortly. (I wanted to get it out before lunch, but I introduced a silly bug that took me a bit to fix.) On Windows with Mingw I get this: pgstat.c:4389:8: warning: variable 'found' set but not used [-Wunused-but-set-variable] I don't get that on Linux, but I bet that is just the gcc version (4.6.2 vs 4.4.6) rather than the OS. I get it on Linux with gcc version 4.7.2. It looks like it is just a useless variable Agreed. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Jeff Janes escribió: On Mon, Feb 18, 2013 at 7:50 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: So here's v11. I intend to commit this shortly. (I wanted to get it out before lunch, but I introduced a silly bug that took me a bit to fix.) On Windows with Mingw I get this: pgstat.c:4389:8: warning: variable 'found' set but not used [-Wunused-but-set-variable] I don't get that on Linux, but I bet that is just the gcc version (4.6.2 vs 4.4.6) rather than the OS. It looks like it is just a useless variable, rather than any possible cause of the Windows make check failure (which I can't reproduce). Hm, I remember looking at that code and thinking that the return there might not be the best idea because it'd miss running the code that checks for clock skew; and so the found was necessary because the return was to be taken out. But on second thought, a database for which the loop terminates early has already run the clock-skew detection code recently, so that's probably not worth worrying about. IOW I will just remove that variable. Thanks for the notice. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Dne 19.02.2013 05:46, Alvaro Herrera napsal: Alvaro Herrera wrote: I have pushed it now. Further testing, of course, is always welcome. Mastodon failed: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodondt=2013-02-19%2000%3A00%3A01 probably worth investigating a bit; we might have broken something. Hmmm, interesting. A single Windows machine, while the other Windows machines seem to work fine (although some of them were not built for a few weeks). I'll look into that, but I have no clue why this might happen. Except maybe for some unexpected timing issue or something ... Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Tomas Vondra t...@fuzzy.cz writes: Dne 19.02.2013 05:46, Alvaro Herrera napsal: Mastodon failed: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodondt=2013-02-19%2000%3A00%3A01 probably worth investigating a bit; we might have broken something. Hmmm, interesting. A single Windows machine, while the other Windows machines seem to work fine (although some of them were not built for a few weeks). Could be random chance --- we've seen the same failure before, eg http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodondt=2012-11-25%2006%3A00%3A00 regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Dne 19.02.2013 11:27, Tom Lane napsal: Tomas Vondra t...@fuzzy.cz writes: Dne 19.02.2013 05:46, Alvaro Herrera napsal: Mastodon failed: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodondt=2013-02-19%2000%3A00%3A01 probably worth investigating a bit; we might have broken something. Hmmm, interesting. A single Windows machine, while the other Windows machines seem to work fine (although some of them were not built for a few weeks). Could be random chance --- we've seen the same failure before, eg http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodondt=2012-11-25%2006%3A00%3A00 Maybe. But why does random chance happens to me only with regression tests and not lottery, like to normal people? Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On 19.2.2013 11:27, Tom Lane wrote: Tomas Vondra t...@fuzzy.cz writes: Dne 19.02.2013 05:46, Alvaro Herrera napsal: Mastodon failed: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodondt=2013-02-19%2000%3A00%3A01 probably worth investigating a bit; we might have broken something. Hmmm, interesting. A single Windows machine, while the other Windows machines seem to work fine (although some of them were not built for a few weeks). Could be random chance --- we've seen the same failure before, eg http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodondt=2012-11-25%2006%3A00%3A00 regards, tom lane I'm looking at that test, and I'm not really sure about a few details. First, this function seems pretty useless to me: === create function wait_for_stats() returns void as $$ declare start_time timestamptz := clock_timestamp(); updated bool; begin -- we don't want to wait forever; loop will exit after 30 seconds for i in 1 .. 300 loop -- check to see if indexscan has been sensed SELECT (st.idx_scan = pr.idx_scan + 1) INTO updated FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr WHERE st.relname='tenk2' AND cl.relname='tenk2'; exit when updated; -- wait a little perform pg_sleep(0.1); -- reset stats snapshot so we can test again perform pg_stat_clear_snapshot(); end loop; -- report time waited in postmaster log (where it won't change test output) raise log 'wait_for_stats delayed % seconds', extract(epoch from clock_timestamp() - start_time); end $$ language plpgsql; === AFAIK the stats remain the same within a transaction, and as a function runs within a transaction, it will either get new data on the first iteration, or it will run all 300 of them. I've checked several buildfarm members and I'm yet to see a single duration between 12ms and 30sec. So IMHO we can replace the function call with pg_sleep(30) and we'll get about the same effect. But this obviously does not answer the question why it failed, although on both occasions there's this log message: [50b1b7fa.0568:14] LOG: wait_for_stats delayed 34.75 seconds which essentialy means the stats were not updated before the call to wait_for_stats(). Anyway, there are these two failing queries: === -- check effects SELECT st.seq_scan = pr.seq_scan + 1, st.seq_tup_read = pr.seq_tup_read + cl.reltuples, st.idx_scan = pr.idx_scan + 1, st.idx_tup_fetch = pr.idx_tup_fetch + 1 FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr WHERE st.relname='tenk2' AND cl.relname='tenk2'; ?column? | ?column? | ?column? | ?column? --+--+--+-- t| t| t| t (1 row) SELECT st.heap_blks_read + st.heap_blks_hit = pr.heap_blks + cl.relpages, st.idx_blks_read + st.idx_blks_hit = pr.idx_blks + 1 FROM pg_statio_user_tables AS st, pg_class AS cl, prevstats AS pr WHERE st.relname='tenk2' AND cl.relname='tenk2'; ?column? | ?column? --+-- t| t (1 row) === The first one returns just falses, the second one retuns either (t,f) or (f,f) - for the two failures posted by Alvaro and TL earlier today. I'm really wondering how that could happen. The only thing that I can think of is some strange timing issue, causing lost requests to write the stats or maybe some of the stats updates. Hmmm, IIRC the stats are sent over UDP - couldn't that be related? Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Tomas Vondra wrote: AFAIK the stats remain the same within a transaction, and as a function runs within a transaction, it will either get new data on the first iteration, or it will run all 300 of them. I've checked several buildfarm members and I'm yet to see a single duration between 12ms and 30sec. No, there's a call to pg_stat_clear_snapshot() that takes care of that. I'm really wondering how that could happen. The only thing that I can think of is some strange timing issue, causing lost requests to write the stats or maybe some of the stats updates. Hmmm, IIRC the stats are sent over UDP - couldn't that be related? yes, UDP packet drops can certainly happen. This is considered a feature (do not cause backends to block when the network socket to stat collector is swamped; it's better to lose some stat messages instead). -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On 19.2.2013 23:31, Alvaro Herrera wrote: Tomas Vondra wrote: AFAIK the stats remain the same within a transaction, and as a function runs within a transaction, it will either get new data on the first iteration, or it will run all 300 of them. I've checked several buildfarm members and I'm yet to see a single duration between 12ms and 30sec. No, there's a call to pg_stat_clear_snapshot() that takes care of that. Aha! Missed that for some reason. Thanks. I'm really wondering how that could happen. The only thing that I can think of is some strange timing issue, causing lost requests to write the stats or maybe some of the stats updates. Hmmm, IIRC the stats are sent over UDP - couldn't that be related? yes, UDP packet drops can certainly happen. This is considered a feature (do not cause backends to block when the network socket to stat collector is swamped; it's better to lose some stat messages instead). Is there anything we could add to the test to identify this? Something that either shows stats were sent and stats arrived (maybe in the log only), or that some UPD packets were dropped? Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On Mon, Feb 18, 2013 at 7:50 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: So here's v11. I intend to commit this shortly. (I wanted to get it out before lunch, but I introduced a silly bug that took me a bit to fix.) On Windows with Mingw I get this: pgstat.c:4389:8: warning: variable 'found' set but not used [-Wunused-but-set-variable] I don't get that on Linux, but I bet that is just the gcc version (4.6.2 vs 4.4.6) rather than the OS. It looks like it is just a useless variable, rather than any possible cause of the Windows make check failure (which I can't reproduce). Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Tomas Vondra wrote: So, here's v10 of the patch (based on the v9+v9a), that implements the approach described above. It turned out to be much easier than I expected (basically just a rewrite of the pgstat_read_db_statsfile_timestamp() function. Thanks. I'm giving this another look now. I think the new code means we no longer need the first_write logic; just let the collector idle until we get the first request. (If for some reason we considered that we should really be publishing initial stats as early as possible, we could just do a write_statsfiles(allDbs) call before entering the main loop. But I don't see any reason to do this. If you do, please speak up.) Also, it seems to me that the new pgstat_db_requested() logic is slightly bogus (in the inefficient sense, not the incorrect sense): we should be comparing the timestamp of the request vs. what's already on disk instead of blindly returning true if the list is nonempty. If the request is older than the file, we don't need to write anything and can discard the request. For example, suppose that backend A sends a request for a DB; we write the file. If then quickly backend B also requests stats for the same DB, with the current logic we'd go write the file, but perhaps backend B would be fine with the file we already wrote. Another point is that I think there's a better way to handle nonexistant files, instead of having to read the global file and all the DB records to find the one we want. Just try to read the database file, and only if that fails try to read the global file and compare the timestamp (so there might be two timestamps for each DB, one in the global file and one in the DB-specific file. I don't think this is a problem). The point is avoid having to read the global file if possible. So here's v11. I intend to commit this shortly. (I wanted to get it out before lunch, but I introduced a silly bug that took me a bit to fix.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services *** a/src/backend/postmaster/pgstat.c --- b/src/backend/postmaster/pgstat.c *** *** 38,43 --- 38,44 #include access/xact.h #include catalog/pg_database.h #include catalog/pg_proc.h + #include lib/ilist.h #include libpq/ip.h #include libpq/libpq.h #include libpq/pqsignal.h *** *** 66,73 * Paths for the statistics files (relative to installation's $PGDATA). * -- */ ! #define PGSTAT_STAT_PERMANENT_FILENAME global/pgstat.stat ! #define PGSTAT_STAT_PERMANENT_TMPFILE global/pgstat.tmp /* -- * Timer definitions. --- 67,75 * Paths for the statistics files (relative to installation's $PGDATA). * -- */ ! #define PGSTAT_STAT_PERMANENT_DIRECTORY pg_stat ! #define PGSTAT_STAT_PERMANENT_FILENAME pg_stat/global.stat ! #define PGSTAT_STAT_PERMANENT_TMPFILE pg_stat/global.tmp /* -- * Timer definitions. *** *** 115,120 int pgstat_track_activity_query_size = 1024; --- 117,123 * Built from GUC parameter * -- */ + char *pgstat_stat_directory = NULL; char *pgstat_stat_filename = NULL; char *pgstat_stat_tmpname = NULL; *** *** 219,229 static int localNumBackends = 0; */ static PgStat_GlobalStats globalStats; ! /* Last time the collector successfully wrote the stats file */ ! static TimestampTz last_statwrite; ! /* Latest statistics request time from backends */ ! static TimestampTz last_statrequest; static volatile bool need_exit = false; static volatile bool got_SIGHUP = false; --- 222,237 */ static PgStat_GlobalStats globalStats; ! /* Write request info for each database */ ! typedef struct DBWriteRequest ! { ! Oid databaseid; /* OID of the database to write */ ! TimestampTz request_time; /* timestamp of the last write request */ ! slist_node next; ! } DBWriteRequest; ! /* Latest statistics request times from backends */ ! static slist_head last_statrequests = SLIST_STATIC_INIT(last_statrequests); static volatile bool need_exit = false; static volatile bool got_SIGHUP = false; *** *** 252,262 static void pgstat_sighup_handler(SIGNAL_ARGS); static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create); static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create); ! static void pgstat_write_statsfile(bool permanent); ! static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent); static void backend_read_statsfile(void); static void pgstat_read_current_status(void); static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg); static void pgstat_send_funcstats(void); static HTAB *pgstat_collect_oids(Oid catalogid); --- 260,275 static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create); static PgStat_StatTabEntry
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On 18.2.2013 16:50, Alvaro Herrera wrote: Tomas Vondra wrote: So, here's v10 of the patch (based on the v9+v9a), that implements the approach described above. It turned out to be much easier than I expected (basically just a rewrite of the pgstat_read_db_statsfile_timestamp() function. Thanks. I'm giving this another look now. I think the new code means we no longer need the first_write logic; just let the collector idle until we get the first request. (If for some reason we considered that we should really be publishing initial stats as early as possible, we could just do a write_statsfiles(allDbs) call before entering the main loop. But I don't see any reason to do this. If you do, please speak up.) Also, it seems to me that the new pgstat_db_requested() logic is slightly bogus (in the inefficient sense, not the incorrect sense): we should be comparing the timestamp of the request vs. what's already on disk instead of blindly returning true if the list is nonempty. If the request is older than the file, we don't need to write anything and can discard the request. For example, suppose that backend A sends a request for a DB; we write the file. If then quickly backend B also requests stats for the same DB, with the current logic we'd go write the file, but perhaps backend B would be fine with the file we already wrote. Hmmm, you're probably right. Another point is that I think there's a better way to handle nonexistant files, instead of having to read the global file and all the DB records to find the one we want. Just try to read the database file, and only if that fails try to read the global file and compare the timestamp (so there might be two timestamps for each DB, one in the global file and one in the DB-specific file. I don't think this is a problem). The point is avoid having to read the global file if possible. I don't think that's a good idea. Keeping the timestamps at one place is a significant simplification, and I don't think it's worth the additional complexity. And the overhead is minimal. So my vote on this change is -1. So here's v11. I intend to commit this shortly. (I wanted to get it out before lunch, but I introduced a silly bug that took me a bit to fix.) ;-) Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Tomas Vondra wrote: On 18.2.2013 16:50, Alvaro Herrera wrote: Also, it seems to me that the new pgstat_db_requested() logic is slightly bogus (in the inefficient sense, not the incorrect sense): we should be comparing the timestamp of the request vs. what's already on disk instead of blindly returning true if the list is nonempty. If the request is older than the file, we don't need to write anything and can discard the request. For example, suppose that backend A sends a request for a DB; we write the file. If then quickly backend B also requests stats for the same DB, with the current logic we'd go write the file, but perhaps backend B would be fine with the file we already wrote. Hmmm, you're probably right. I left it as is for now; I think it warrants revisiting. Another point is that I think there's a better way to handle nonexistant files, instead of having to read the global file and all the DB records to find the one we want. Just try to read the database file, and only if that fails try to read the global file and compare the timestamp (so there might be two timestamps for each DB, one in the global file and one in the DB-specific file. I don't think this is a problem). The point is avoid having to read the global file if possible. I don't think that's a good idea. Keeping the timestamps at one place is a significant simplification, and I don't think it's worth the additional complexity. And the overhead is minimal. So my vote on this change is -1. Fair enough. I have pushed it now. Further testing, of course, is always welcome. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Alvaro Herrera wrote: I have pushed it now. Further testing, of course, is always welcome. Mastodon failed: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodondt=2013-02-19%2000%3A00%3A01 probably worth investigating a bit; we might have broken something. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On 17.2.2013 06:46, Alvaro Herrera wrote: Tomas Vondra wrote: I've been thinking about this (actually I had a really weird dream about it this night) and I think it might work like this: (1) check the timestamp of the global file - if it's too old, we need to send an inquiry or wait a bit longer (2) if it's new enough, we need to read it a look for that particular database - if it's not found, we have no info about it yet (this is the case handled by the dummy files) (3) if there's a database stat entry, we need to check the timestamp when it was written for the last time - if it's too old, send an inquiry and wait a bit longer (4) well, we have a recent global file, it contains the database stat entry and it's fresh enough - tadaa, we're done Hmm, yes, I think this is what I was imagining. I had even considered that the timestamp would be removed from the per-db file as you suggest here. So, here's v10 of the patch (based on the v9+v9a), that implements the approach described above. It turned out to be much easier than I expected (basically just a rewrite of the pgstat_read_db_statsfile_timestamp() function. I've done a fair amount of testing (and will do some more next week) but it seems to work just fine - no errors, no measurable decrease of performance etc. regards Tomas Vondra diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 9b92ebb..36c0d8b 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -38,6 +38,7 @@ #include access/xact.h #include catalog/pg_database.h #include catalog/pg_proc.h +#include lib/ilist.h #include libpq/ip.h #include libpq/libpq.h #include libpq/pqsignal.h @@ -66,8 +67,9 @@ * Paths for the statistics files (relative to installation's $PGDATA). * -- */ -#define PGSTAT_STAT_PERMANENT_FILENAME global/pgstat.stat -#define PGSTAT_STAT_PERMANENT_TMPFILE global/pgstat.tmp +#define PGSTAT_STAT_PERMANENT_DIRECTORY pg_stat +#define PGSTAT_STAT_PERMANENT_FILENAME pg_stat/global.stat +#define PGSTAT_STAT_PERMANENT_TMPFILE pg_stat/global.tmp /* -- * Timer definitions. @@ -115,6 +117,8 @@ int pgstat_track_activity_query_size = 1024; * Built from GUC parameter * -- */ +char *pgstat_stat_directory = NULL; +int pgstat_stat_dbfile_maxlen = 0; char *pgstat_stat_filename = NULL; char *pgstat_stat_tmpname = NULL; @@ -219,11 +223,16 @@ static int localNumBackends = 0; */ static PgStat_GlobalStats globalStats; -/* Last time the collector successfully wrote the stats file */ -static TimestampTz last_statwrite; +/* Write request info for each database */ +typedef struct DBWriteRequest +{ + Oid databaseid; /* OID of the database to write */ + TimestampTz request_time; /* timestamp of the last write request */ + slist_node next; +} DBWriteRequest; -/* Latest statistics request time from backends */ -static TimestampTz last_statrequest; +/* Latest statistics request times from backends */ +static slist_head last_statrequests = SLIST_STATIC_INIT(last_statrequests); static volatile bool need_exit = false; static volatile bool got_SIGHUP = false; @@ -252,11 +261,16 @@ static void pgstat_sighup_handler(SIGNAL_ARGS); static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create); static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create); -static void pgstat_write_statsfile(bool permanent); -static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent); +static void pgstat_write_statsfiles(bool permanent, bool allDbs); +static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent); +static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent, bool deep); +static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent); static void backend_read_statsfile(void); static void pgstat_read_current_status(void); +static bool pgstat_write_statsfile_needed(void); +static bool pgstat_db_requested(Oid databaseid); + static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg); static void pgstat_send_funcstats(void); static HTAB *pgstat_collect_oids(Oid catalogid); @@ -285,7 +299,6 @@ static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len); static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len); - /* * Public functions called from postmaster follow * @@ -541,16 +554,40 @@ startup_failed: } /* + * subroutine for pgstat_reset_all + */ +static void +pgstat_reset_remove_files(const char *directory) +{ + DIR * dir; + struct dirent * entry; + char fname[MAXPGPATH]; + + dir = AllocateDir(pgstat_stat_directory); + while ((entry = ReadDir(dir,
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On 15.2.2013 01:02, Tomas Vondra wrote: On 14.2.2013 22:24, Alvaro Herrera wrote: Alvaro Herrera escribió: Here's a ninth version of this patch. (version 8 went unpublished). I have simplified a lot of things and improved some comments; I think I understand much of it now. I think this patch is fairly close to committable, but one issue remains, which is this bit in pgstat_write_statsfiles(): I've marked this as Waiting on author for the time being. I'm going to review/work on other patches now, hoping that Tomas will post an updated version in time for it to be considered for 9.3. Sadly I have no idea how to fix that, and I think the solution you suggested in the previous messages does not actually do the trick :-( I've been thinking about this (actually I had a really weird dream about it this night) and I think it might work like this: (1) check the timestamp of the global file - if it's too old, we need to send an inquiry or wait a bit longer (2) if it's new enough, we need to read it a look for that particular database - if it's not found, we have no info about it yet (this is the case handled by the dummy files) (3) if there's a database stat entry, we need to check the timestamp when it was written for the last time - if it's too old, send an inquiry and wait a bit longer (4) well, we have a recent global file, it contains the database stat entry and it's fresh enough - tadaa, we're done At least that's my idea - I haven't tried to implement it yet. I see a few pros and cons of this approach: pros: * no dummy files * no timestamps in the per-db files (and thus no sync issues) cons: * the backends / workers will have to re-read the global file just to check that the per-db file is there and is fresh enough So far it was sufficient just to peek at the timestamp at the beginning of the per-db stat file - minimum data read, no CPU-expensive processing etc. Sadly the more DBs there are, the larger the file get (thus more overhead to read it). OTOH it's not that much data (~180B per entry, so with a 1000 of dbs it's just ~180kB) so I don't expect this to be a tremendous issue. And the pros seem to be quite compelling. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Tomas Vondra wrote: I've been thinking about this (actually I had a really weird dream about it this night) and I think it might work like this: (1) check the timestamp of the global file - if it's too old, we need to send an inquiry or wait a bit longer (2) if it's new enough, we need to read it a look for that particular database - if it's not found, we have no info about it yet (this is the case handled by the dummy files) (3) if there's a database stat entry, we need to check the timestamp when it was written for the last time - if it's too old, send an inquiry and wait a bit longer (4) well, we have a recent global file, it contains the database stat entry and it's fresh enough - tadaa, we're done Hmm, yes, I think this is what I was imagining. I had even considered that the timestamp would be removed from the per-db file as you suggest here. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Tomas Vondra escribió: On 14.2.2013 20:23, Alvaro Herrera wrote: The problem here is that creating these dummy entries will cause a difference in autovacuum behavior. Autovacuum will skip processing databases with no pgstat entry, and the intended reason is that if there's no pgstat entry it's because the database doesn't have enough activity. Now perhaps we want to change that, but it should be an explicit decision taken after discussion and thought, not side effect from an unrelated patch. I don't see how that changes the autovacuum behavior. Can you explain that a bit more? As I see it, with the old (single-file version) the autovacuum worker would get exacly the same thing, i.e. no stats at all. See in autovacuum.c the calls to pgstat_fetch_stat_dbentry(). Most of them check for NULL result and act differently depending on that. Returning a valid (not NULL) entry full of zeroes is not the same. I didn't actually try to reproduce a problem. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On 15.2.2013 16:38, Alvaro Herrera wrote: Tomas Vondra escribió: On 14.2.2013 20:23, Alvaro Herrera wrote: The problem here is that creating these dummy entries will cause a difference in autovacuum behavior. Autovacuum will skip processing databases with no pgstat entry, and the intended reason is that if there's no pgstat entry it's because the database doesn't have enough activity. Now perhaps we want to change that, but it should be an explicit decision taken after discussion and thought, not side effect from an unrelated patch. I don't see how that changes the autovacuum behavior. Can you explain that a bit more? As I see it, with the old (single-file version) the autovacuum worker would get exacly the same thing, i.e. no stats at all. See in autovacuum.c the calls to pgstat_fetch_stat_dbentry(). Most of them check for NULL result and act differently depending on that. Returning a valid (not NULL) entry full of zeroes is not the same. I didn't actually try to reproduce a problem. E, but why would the patched code return entry full of zeroes and not NULL as before? The dummy files serve single purpose - confirm that the collector attempted to write info for the particular database (and did not found any data for that). All it contains is a timestamp of the write - nothing else. So the worker will read the global file (containing list of stats for dbs) and then will get NULL just like the old code. Because the database is not there and the patch does not change that at all. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Alvaro Herrera escribió: Hm, and I now also realize another bug in this patch: the global stats only include database entries for requested databases; but perhaps the existing files can serve later requestors just fine for databases that already had files; so the global stats file should continue to carry entries for them, with the old timestamps. Actually the code already do things that way -- apologies. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Here's a ninth version of this patch. (version 8 went unpublished). I have simplified a lot of things and improved some comments; I think I understand much of it now. I think this patch is fairly close to committable, but one issue remains, which is this bit in pgstat_write_statsfiles(): /* In any case, we can just throw away all the db requests, but we need to * write dummy files for databases without a stat entry (it would cause * issues in pgstat_read_db_statsfile_timestamp and pgstat wait timeouts). * This may happen e.g. for shared DB (oid = 0) right after initdb. */ if (!slist_is_empty(last_statrequests)) { slist_mutable_iter iter; slist_foreach_modify(iter, last_statrequests) { DBWriteRequest *req = slist_container(DBWriteRequest, next, iter.cur); /* * Create dummy files for requested databases without a proper * dbentry. It's much easier this way than dealing with multiple * timestamps, possibly existing but not yet written DBs etc. * */ if (!pgstat_get_db_entry(req-databaseid, false)) pgstat_write_db_dummyfile(req-databaseid); pfree(req); } slist_init(last_statrequests); } The problem here is that creating these dummy entries will cause a difference in autovacuum behavior. Autovacuum will skip processing databases with no pgstat entry, and the intended reason is that if there's no pgstat entry it's because the database doesn't have enough activity. Now perhaps we want to change that, but it should be an explicit decision taken after discussion and thought, not side effect from an unrelated patch. Hm, and I now also realize another bug in this patch: the global stats only include database entries for requested databases; but perhaps the existing files can serve later requestors just fine for databases that already had files; so the global stats file should continue to carry entries for them, with the old timestamps. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services *** a/src/backend/postmaster/pgstat.c --- b/src/backend/postmaster/pgstat.c *** *** 38,43 --- 38,44 #include access/xact.h #include catalog/pg_database.h #include catalog/pg_proc.h + #include lib/ilist.h #include libpq/ip.h #include libpq/libpq.h #include libpq/pqsignal.h *** *** 66,73 * Paths for the statistics files (relative to installation's $PGDATA). * -- */ ! #define PGSTAT_STAT_PERMANENT_FILENAME global/pgstat.stat ! #define PGSTAT_STAT_PERMANENT_TMPFILE global/pgstat.tmp /* -- * Timer definitions. --- 67,75 * Paths for the statistics files (relative to installation's $PGDATA). * -- */ ! #define PGSTAT_STAT_PERMANENT_DIRECTORY pg_stat ! #define PGSTAT_STAT_PERMANENT_FILENAME pg_stat/global.stat ! #define PGSTAT_STAT_PERMANENT_TMPFILE pg_stat/global.tmp /* -- * Timer definitions. *** *** 115,120 int pgstat_track_activity_query_size = 1024; --- 117,124 * Built from GUC parameter * -- */ + char *pgstat_stat_directory = NULL; + int pgstat_stat_dbfile_maxlen = 0; char *pgstat_stat_filename = NULL; char *pgstat_stat_tmpname = NULL; *** *** 219,229 static int localNumBackends = 0; */ static PgStat_GlobalStats globalStats; ! /* Last time the collector successfully wrote the stats file */ ! static TimestampTz last_statwrite; ! /* Latest statistics request time from backends */ ! static TimestampTz last_statrequest; static volatile bool need_exit = false; static volatile bool got_SIGHUP = false; --- 223,238 */ static PgStat_GlobalStats globalStats; ! /* Write request info for each database */ ! typedef struct DBWriteRequest ! { ! Oid databaseid; /* OID of the database to write */ ! TimestampTz request_time; /* timestamp of the last write request */ ! slist_node next; ! } DBWriteRequest; ! /* Latest statistics request times from backends */ ! static slist_head last_statrequests = SLIST_STATIC_INIT(last_statrequests); static volatile bool need_exit = false; static volatile bool got_SIGHUP = false; *** *** 252,262 static void pgstat_sighup_handler(SIGNAL_ARGS); static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create); static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create); ! static void pgstat_write_statsfile(bool permanent); ! static HTAB
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
I saw discussion about this on this thread, but I'm not able to figure out what the answer is: how does this work with moving the stats file, for example to a RAMdisk? Specifically, if the user sets stats_temp_directory, does it continue to work the way it does now? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Josh Berkus wrote: I saw discussion about this on this thread, but I'm not able to figure out what the answer is: how does this work with moving the stats file, for example to a RAMdisk? Specifically, if the user sets stats_temp_directory, does it continue to work the way it does now? Of course. You get more files than previously, but yes. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Alvaro Herrera escribió: Here's a ninth version of this patch. (version 8 went unpublished). I have simplified a lot of things and improved some comments; I think I understand much of it now. I think this patch is fairly close to committable, but one issue remains, which is this bit in pgstat_write_statsfiles(): /* In any case, we can just throw away all the db requests, but we need to * write dummy files for databases without a stat entry (it would cause * issues in pgstat_read_db_statsfile_timestamp and pgstat wait timeouts). * This may happen e.g. for shared DB (oid = 0) right after initdb. */ I think the real way to handle this is to fix backend_read_statsfile(). It's using the old logic of considering existance of the file, but of course now the file might not exist at all and that doesn't mean we need to continue kicking the collector to write it. We need a mechanism to figure that the collector is just not going to write the file no matter how hard we kick it ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Alvaro Herrera escribió: Here's a ninth version of this patch. (version 8 went unpublished). I have simplified a lot of things and improved some comments; I think I understand much of it now. I think this patch is fairly close to committable, but one issue remains, which is this bit in pgstat_write_statsfiles(): I've marked this as Waiting on author for the time being. I'm going to review/work on other patches now, hoping that Tomas will post an updated version in time for it to be considered for 9.3. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On 14.2.2013 20:43, Josh Berkus wrote: I saw discussion about this on this thread, but I'm not able to figure out what the answer is: how does this work with moving the stats file, for example to a RAMdisk? Specifically, if the user sets stats_temp_directory, does it continue to work the way it does now? No change in this respect - you can still use RAMdisk, and you'll actually need less space because the space requirements decreased due to breaking the single file into multiple pieces. We're using it this way (on a tmpfs filesystem) and it works like a charm. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
First of all, big thanks for working on this patch and not only identifying the issues but actually fixing them. On 14.2.2013 20:23, Alvaro Herrera wrote: Here's a ninth version of this patch. (version 8 went unpublished). I have simplified a lot of things and improved some comments; I think I understand much of it now. I think this patch is fairly close to committable, but one issue remains, which is this bit in pgstat_write_statsfiles(): ... The problem here is that creating these dummy entries will cause a difference in autovacuum behavior. Autovacuum will skip processing databases with no pgstat entry, and the intended reason is that if there's no pgstat entry it's because the database doesn't have enough activity. Now perhaps we want to change that, but it should be an explicit decision taken after discussion and thought, not side effect from an unrelated patch. I don't see how that changes the autovacuum behavior. Can you explain that a bit more? As I see it, with the old (single-file version) the autovacuum worker would get exacly the same thing, i.e. no stats at all. Which is exacly what autovacuum worker gets with the new code, except that the check for last statfile timestamp uses the per-db file, so we need to write it. This way the worker is able to read the timestamp, is happy about it because it gets a fresh file although it gets no stats later. Where is the behavior change? Can you provide an example? kind regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On 14.2.2013 22:24, Alvaro Herrera wrote: Alvaro Herrera escribió: Here's a ninth version of this patch. (version 8 went unpublished). I have simplified a lot of things and improved some comments; I think I understand much of it now. I think this patch is fairly close to committable, but one issue remains, which is this bit in pgstat_write_statsfiles(): I've marked this as Waiting on author for the time being. I'm going to review/work on other patches now, hoping that Tomas will post an updated version in time for it to be considered for 9.3. Sadly I have no idea how to fix that, and I think the solution you suggested in the previous messages does not actually do the trick :-( Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Tomas Vondra escribió: I don't see how that changes the autovacuum behavior. Can you explain that a bit more? It might be that I'm all wet on this. I'll poke at it some more. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Here's an updated version of this patch that takes care of the issues I reported previously: no more repalloc() of the requests array; it's now an slist, which makes the code much more natural IMV. And no more messing around with doing sprintf to create a separate sprintf pattern for the per-db stats file; instead have a function to return the name that uses just the pgstat dir as stored by GUC. I think this can be further simplified still. I haven't reviewed the rest yet; please do give this a try to confirm that the speedups previously reported are still there (i.e. I didn't completely blew it). Thanks -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services *** a/src/backend/postmaster/pgstat.c --- b/src/backend/postmaster/pgstat.c *** *** 38,43 --- 38,44 #include access/xact.h #include catalog/pg_database.h #include catalog/pg_proc.h + #include lib/ilist.h #include libpq/ip.h #include libpq/libpq.h #include libpq/pqsignal.h *** *** 66,73 * Paths for the statistics files (relative to installation's $PGDATA). * -- */ ! #define PGSTAT_STAT_PERMANENT_FILENAME global/pgstat.stat ! #define PGSTAT_STAT_PERMANENT_TMPFILE global/pgstat.tmp /* -- * Timer definitions. --- 67,75 * Paths for the statistics files (relative to installation's $PGDATA). * -- */ ! #define PGSTAT_STAT_PERMANENT_DIRECTORY pg_stat ! #define PGSTAT_STAT_PERMANENT_FILENAME pg_stat/global.stat ! #define PGSTAT_STAT_PERMANENT_TMPFILE pg_stat/global.tmp /* -- * Timer definitions. *** *** 115,120 int pgstat_track_activity_query_size = 1024; --- 117,123 * Built from GUC parameter * -- */ + char *pgstat_stat_directory = NULL; char *pgstat_stat_filename = NULL; char *pgstat_stat_tmpname = NULL; *** *** 219,229 static int localNumBackends = 0; */ static PgStat_GlobalStats globalStats; ! /* Last time the collector successfully wrote the stats file */ ! static TimestampTz last_statwrite; ! /* Latest statistics request time from backends */ ! static TimestampTz last_statrequest; static volatile bool need_exit = false; static volatile bool got_SIGHUP = false; --- 222,237 */ static PgStat_GlobalStats globalStats; ! /* Write request info for each database */ ! typedef struct DBWriteRequest ! { ! Oid databaseid; /* OID of the database to write */ ! TimestampTz request_time; /* timestamp of the last write request */ ! slist_node next; ! } DBWriteRequest; ! /* Latest statistics request time from backends for each DB */ ! static slist_head last_statrequests = SLIST_STATIC_INIT(last_statrequests); static volatile bool need_exit = false; static volatile bool got_SIGHUP = false; *** *** 252,262 static void pgstat_sighup_handler(SIGNAL_ARGS); static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create); static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create); ! static void pgstat_write_statsfile(bool permanent); ! static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent); static void backend_read_statsfile(void); static void pgstat_read_current_status(void); static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg); static void pgstat_send_funcstats(void); static HTAB *pgstat_collect_oids(Oid catalogid); --- 260,276 static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create); static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create); ! static void pgstat_write_statsfile(bool permanent, bool force); ! static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent); ! static void pgstat_write_db_dummyfile(Oid databaseid); ! static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs); ! static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent); static void backend_read_statsfile(void); static void pgstat_read_current_status(void); + static bool pgstat_write_statsfile_needed(void); + static bool pgstat_db_requested(Oid databaseid); + static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg); static void pgstat_send_funcstats(void); static HTAB *pgstat_collect_oids(Oid catalogid); *** *** 285,291 static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len); static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len); - /* * Public functions called from postmaster follow * --- 299,304 *** *** 549,556
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On Tue, Feb 5, 2013 at 2:31 PM, Tomas Vondra t...@fuzzy.cz wrote: We do not already have this. There is no relevant spec. I can't see how this could need pg_dump support (but what about pg_upgrade?) pg_dump - no pg_upgrage - IMHO it should create the pg_stat directory. I don't think it could convert statfile into the new format (by splitting it into the pieces). I haven't checked but I believe the default behavior is to delete it as there might be new fields / slight changes of meaning etc. Right, I have no concerns with pg_upgrade any more. The pg_stat will inherently get created by the initdb of the new cluster (because the initdb will done with the new binaries with your patch in place them). pg_upgrade currently doesn't copy over global/pgstat.stat. So that means the new cluster doesn't have the activity stats either way, patch or unpatched. So if it is not currently a problem it will not become one under the proposed patch. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Pavel Stehule pavel.steh...@gmail.com writes: Nice. Another interesting numbers would be device utilization, average I/O speed and required space (which should be ~2x the pgstat.stat size without the patch). this point is important - with large warehouse with lot of databases and tables you have move stat file to some ramdisc - without it you lost lot of IO capacity - and it is very important if you need only half sized ramdisc [ blink... ] I confess I'd not been paying close attention to this thread, but if that's true I'd say the patch is DOA. Why should we accept 2x bloat in the already-far-too-large stats file? I thought the idea was just to split up the existing data into multiple files. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Tom Lane escribió: Pavel Stehule pavel.steh...@gmail.com writes: Nice. Another interesting numbers would be device utilization, average I/O speed and required space (which should be ~2x the pgstat.stat size without the patch). this point is important - with large warehouse with lot of databases and tables you have move stat file to some ramdisc - without it you lost lot of IO capacity - and it is very important if you need only half sized ramdisc [ blink... ] I confess I'd not been paying close attention to this thread, but if that's true I'd say the patch is DOA. Why should we accept 2x bloat in the already-far-too-large stats file? I thought the idea was just to split up the existing data into multiple files. I think they are saying just the opposite: maximum disk space utilization is now half of the unpatched code. This is because when we need to write the temporary file to rename on top of the other one, the temporary file is not of the size of the complete pgstat data collation, but just that for the requested database. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
2013/2/6 Alvaro Herrera alvhe...@2ndquadrant.com: Tom Lane escribió: Pavel Stehule pavel.steh...@gmail.com writes: Nice. Another interesting numbers would be device utilization, average I/O speed and required space (which should be ~2x the pgstat.stat size without the patch). this point is important - with large warehouse with lot of databases and tables you have move stat file to some ramdisc - without it you lost lot of IO capacity - and it is very important if you need only half sized ramdisc [ blink... ] I confess I'd not been paying close attention to this thread, but if that's true I'd say the patch is DOA. Why should we accept 2x bloat in the already-far-too-large stats file? I thought the idea was just to split up the existing data into multiple files. I think they are saying just the opposite: maximum disk space utilization is now half of the unpatched code. This is because when we need to write the temporary file to rename on top of the other one, the temporary file is not of the size of the complete pgstat data collation, but just that for the requested database. +1 Pavel -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Dne 06.02.2013 16:53, Alvaro Herrera napsal: Tom Lane escribió: Pavel Stehule pavel.steh...@gmail.com writes: Nice. Another interesting numbers would be device utilization, average I/O speed and required space (which should be ~2x the pgstat.stat size without the patch). this point is important - with large warehouse with lot of databases and tables you have move stat file to some ramdisc - without it you lost lot of IO capacity - and it is very important if you need only half sized ramdisc [ blink... ] I confess I'd not been paying close attention to this thread, but if that's true I'd say the patch is DOA. Why should we accept 2x bloat in the already-far-too-large stats file? I thought the idea was just to split up the existing data into multiple files. I think they are saying just the opposite: maximum disk space utilization is now half of the unpatched code. This is because when we need to write the temporary file to rename on top of the other one, the temporary file is not of the size of the complete pgstat data collation, but just that for the requested database. Exactly. And I suspect the current (unpatched) code ofter requires more than twice the space because of open file descriptors to already deleted files. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On Tue, Feb 5, 2013 at 2:31 PM, Tomas Vondra t...@fuzzy.cz wrote: On 5.2.2013 19:23, Jeff Janes wrote: If I shutdown the server and blow away the stats with rm data/pg_stat/*, it recovers gracefully when I start it back up. If a do rm -r data/pg_stat then it has problems the next time I shut it down, but I have no right to do that in the first place. If I initdb a database without this patch, then shut it down and restart with binaries that include this patch, and need to manually make the pg_stat directory. Does that mean it needs a catalog bump in order to force an initdb? U, what you mean by catalog bump? There is a catalog number in src/include/catalog/catversion.h, which when changed forces one to redo initdb. Formally I guess it is only for system catalog changes, but I thought it was used for any on-disk changes during development cycles. I like it the way it is, as I can use the same data directory for both versions of the binary (patched and unpatched), and just manually create or remove the directory pg_stat directory when changing modes. That is ideal for testing this patch, probably not ideal for being committed into the tree along with all the other ongoing devel work. But I think this is something the committer has to worry about. I have a question about its completeness. When I first start up the cluster and have not yet touched it, there is very little stats collector activity, either with or without this patch. When I kick the cluster sufficiently (I've been using vacuumdb -a to do that) then there is a lot of stats collector activity. Even once the vacuumdb has long finished, this high level of activity continues even though the database is otherwise completely idle, and this seems to happen for every. This patch makes that high level of activity much more efficient, but it does not reduce the activity. I don't understand why an idle database cannot get back into the state right after start-up. What do you mean by stats collector activity? Is it reading/writing a lot of data, or is it just using a lot of CPU? Basically, the launching of new autovac workers and the work that that entails. Your patch reduces the size of data that needs to be written, read, and parsed for every launch, but not the number of times that that happens. Isn't that just a natural and expected behavior because the database needs to actually perform ANALYZE to actually collect the data. Although the tables are empty, it costs some CPU / IO and there's a lot of them (1000 dbs, each with 200 tables). It isn't touching the tables at all, just the stats files. I was wrong about the cluster opening quiet. It only does that if, while the cluster was shutdown, you remove the statistics files which I was doing, as I was switching back and forth between patched and unpatched. When the cluster opens, any databases that don't have statistics in the stat file(s) will not get an autovacuum worker process spawned. They only start getting spawned once someone asks for statistics for that database. But then once that happens, that database then gets a worker spawned for it every naptime (or, at least, as close to that as the server can keep up with) for eternity, even if that database is never used again. The only way to stop this is the unsupported way of blowing away the permanent stats files. I don't think there's a way around this. You may increase the autovacuum naptime, but that's about all. I do not think that the patch needs to solve this problem in order to be accepted, but if it can be addressed while the author and reviewers are paying attention to this part of the system, that would be ideal. And if not, then we should at least remember that there is future work that could be done here. If I understand that correctly, you see the same behaviour even without the patch, right? In that case I'd vote not to make the patch more complex, and try to improve that separately (if it's even possible). OK. I just thought that while digging through the code, you might have a good idea for fixing this part as well. If so, it would be a shame for that idea to be lost when you move on to other things. I created 1000 databases each with 200 single column tables (x serial primary key). After vacuumdb -a, I let it idle for a long time to see what steady state was reached. without the patch: vacuumdb -a real11m2.624s idle steady state: 48.17% user, 39.24% system, 11.78% iowait, 0.81% idle. with the patch: vacuumdb -areal6m41.306s idle steady state: 7.86% user, 5.00% system 0.09% iowait 87% idle, Nice. Another interesting numbers would be device utilization, average I/O speed I didn't gather that data, as I never figured out how to interpret those numbers and so don't have much faith in them. (But I am pretty impressed with the numbers I do understand) and required space (which should be ~2x the pgstat.stat size without
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Jeff Janes jeff.ja...@gmail.com writes: On Tue, Feb 5, 2013 at 2:31 PM, Tomas Vondra t...@fuzzy.cz wrote: U, what you mean by catalog bump? There is a catalog number in src/include/catalog/catversion.h, which when changed forces one to redo initdb. Formally I guess it is only for system catalog changes, but I thought it was used for any on-disk changes during development cycles. Yeah, it would be appropriate to bump the catversion if we're creating a new PGDATA subdirectory. I'm not excited about keeping code to take care of the lack of such a subdirectory at runtime, as I gather there is in the current state of the patch. Formally, if there were such code, we'd not need a catversion bump --- the rule of thumb is to change catversion if the new postgres executable would fail regression tests without a run of the new initdb. But it's pretty dumb to keep such code indefinitely, when it would have no more possible use after the next catversion bump (which is seldom more than a week or two away during devel phase). What do you mean by stats collector activity? Is it reading/writing a lot of data, or is it just using a lot of CPU? Basically, the launching of new autovac workers and the work that that entails. Your patch reduces the size of data that needs to be written, read, and parsed for every launch, but not the number of times that that happens. It doesn't seem very reasonable to ask this patch to redesign the autovacuum algorithms, which is essentially what it'll take to improve that. That's a completely separate layer of code. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On 7.2.2013 00:40, Tom Lane wrote: Jeff Janes jeff.ja...@gmail.com writes: On Tue, Feb 5, 2013 at 2:31 PM, Tomas Vondra t...@fuzzy.cz wrote: U, what you mean by catalog bump? There is a catalog number in src/include/catalog/catversion.h, which when changed forces one to redo initdb. Formally I guess it is only for system catalog changes, but I thought it was used for any on-disk changes during development cycles. Yeah, it would be appropriate to bump the catversion if we're creating a new PGDATA subdirectory. I'm not excited about keeping code to take care of the lack of such a subdirectory at runtime, as I gather there is in the current state of the patch. Formally, if there were such code, we'd not need a No, there is nothing to handle that at runtime. The directory is created at initdb and the patch expects that (and fails if it's gone). catversion bump --- the rule of thumb is to change catversion if the new postgres executable would fail regression tests without a run of the new initdb. But it's pretty dumb to keep such code indefinitely, when it would have no more possible use after the next catversion bump (which is seldom more than a week or two away during devel phase). What do you mean by stats collector activity? Is it reading/writing a lot of data, or is it just using a lot of CPU? Basically, the launching of new autovac workers and the work that that entails. Your patch reduces the size of data that needs to be written, read, and parsed for every launch, but not the number of times that that happens. It doesn't seem very reasonable to ask this patch to redesign the autovacuum algorithms, which is essentially what it'll take to improve that. That's a completely separate layer of code. My opinion, exactly. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On Sun, Feb 3, 2013 at 4:51 PM, Tomas Vondra t...@fuzzy.cz wrote: LOG: last_statwrite 11133-08-28 19:22:31.711744+02 is later than collector's time 2013-02-04 00:54:21.113439+01 for db 19093 WARNING: pgstat wait timeout LOG: last_statwrite 39681-12-23 18:48:48.9093+01 is later than collector's time 2013-02-04 00:54:31.424681+01 for db 46494 FATAL: could not find block containing chunk 0x2af4a60 LOG: statistics collector process (PID 10063) exited with exit code 1 WARNING: pgstat wait timeout WARNING: pgstat wait timeout I'm not entirely sure where the FATAL came from, but it seems it was somehow related to the issue - it was quite reproducible, although I don't see how exactly could this happen. There relevant block of code looks like this: char *writetime; char *mytime; /* Copy because timestamptz_to_str returns a static buffer */ writetime = pstrdup(timestamptz_to_str(dbentry-stats_timestamp)); mytime = pstrdup(timestamptz_to_str(cur_ts)); elog(LOG, last_statwrite %s is later than collector's time %s for db %d, writetime, mytime, dbentry-databaseid); pfree(writetime); pfree(mytime); which seems quite fine to mee. I'm not sure how one of the pfree calls could fail? I don't recall seeing the FATAL errors myself, but didn't keep the logfile around. (I do recall seeing the pgstat wait timeout). Are you using windows? pstrdup seems to be different there. I'm afraid I don't have much to say on the code. Indeed I never even look at it (other than grepping for pstrdup just now). I am taking a purely experimental approach, Since Alvaro and others have looked at the code. Anyway, attached is a patch that fixes all three issues, i.e. 1) the un-initialized timestamp 2) the void ommited from the signature 3) rename to pg_stat instead of just stat Thanks. If I shutdown the server and blow away the stats with rm data/pg_stat/*, it recovers gracefully when I start it back up. If a do rm -r data/pg_stat then it has problems the next time I shut it down, but I have no right to do that in the first place. If I initdb a database without this patch, then shut it down and restart with binaries that include this patch, and need to manually make the pg_stat directory. Does that mean it needs a catalog bump in order to force an initdb? A review: It applies cleanly (some offsets, no fuzz), builds without warnings, and passes make check including with cassert. The final test done in make check inherently tests this code, and it passes. If I intentionally break the patch by making pgstat_read_db_statsfile add one to the oid it opens, then the test fails. So the existing test is at least plausible as a test. doc/src/sgml/monitoring.sgml needs to be changed: a permanent copy of the statistics data is stored in the global subdirectory. I'm not aware of any other needed changes to the docs. The big question is whether we want this. I think we do. While having hundreds of databases in a cluster is not recommended, that is no reason not to handle it better than we do. I don't see any down-sides, other than possibly some code uglification. Some file systems might not deal well with having lots of small stats files being rapidly written and rewritten, but it is hard to see how the current behavior would be more favorable for those systems. We do not already have this. There is no relevant spec. I can't see how this could need pg_dump support (but what about pg_upgrade?) I am not aware of any dangers. I have a question about its completeness. When I first start up the cluster and have not yet touched it, there is very little stats collector activity, either with or without this patch. When I kick the cluster sufficiently (I've been using vacuumdb -a to do that) then there is a lot of stats collector activity. Even once the vacuumdb has long finished, this high level of activity continues even though the database is otherwise completely idle, and this seems to happen for every. This patch makes that high level of activity much more efficient, but it does not reduce the activity. I don't understand why an idle database cannot get back into the state right after start-up. I do not think that the patch needs to solve this problem in order to be accepted, but if it can be addressed while the author and reviewers are paying attention to this part of the system, that would be ideal. And if not, then we should at least remember that there is future work that could be done here. I created 1000 databases each with 200 single column tables (x serial primary key). After vacuumdb -a, I let it idle for a long time to see what steady state was reached. without the patch: vacuumdb -a real11m2.624s idle steady state: 48.17% user, 39.24% system, 11.78% iowait, 0.81% idle. with the patch: vacuumdb -areal6m41.306s idle steady state: 7.86% user, 5.00% system 0.09% iowait 87% idle, I also ran pgbench on a
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On 5.2.2013 19:23, Jeff Janes wrote: On Sun, Feb 3, 2013 at 4:51 PM, Tomas Vondra t...@fuzzy.cz wrote: LOG: last_statwrite 11133-08-28 19:22:31.711744+02 is later than collector's time 2013-02-04 00:54:21.113439+01 for db 19093 WARNING: pgstat wait timeout LOG: last_statwrite 39681-12-23 18:48:48.9093+01 is later than collector's time 2013-02-04 00:54:31.424681+01 for db 46494 FATAL: could not find block containing chunk 0x2af4a60 LOG: statistics collector process (PID 10063) exited with exit code 1 WARNING: pgstat wait timeout WARNING: pgstat wait timeout I'm not entirely sure where the FATAL came from, but it seems it was somehow related to the issue - it was quite reproducible, although I don't see how exactly could this happen. There relevant block of code looks like this: char *writetime; char *mytime; /* Copy because timestamptz_to_str returns a static buffer */ writetime = pstrdup(timestamptz_to_str(dbentry-stats_timestamp)); mytime = pstrdup(timestamptz_to_str(cur_ts)); elog(LOG, last_statwrite %s is later than collector's time %s for db %d, writetime, mytime, dbentry-databaseid); pfree(writetime); pfree(mytime); which seems quite fine to mee. I'm not sure how one of the pfree calls could fail? I don't recall seeing the FATAL errors myself, but didn't keep the logfile around. (I do recall seeing the pgstat wait timeout). Are you using windows? pstrdup seems to be different there. Nope. I'll repeat the test with the original patch to find out what went wrong, just to be sure it was fixed. I'm afraid I don't have much to say on the code. Indeed I never even look at it (other than grepping for pstrdup just now). I am taking a purely experimental approach, Since Alvaro and others have looked at the code. Thanks for finding the issue with unitialized timestamp! If I shutdown the server and blow away the stats with rm data/pg_stat/*, it recovers gracefully when I start it back up. If a do rm -r data/pg_stat then it has problems the next time I shut it down, but I have no right to do that in the first place. If I initdb a database without this patch, then shut it down and restart with binaries that include this patch, and need to manually make the pg_stat directory. Does that mean it needs a catalog bump in order to force an initdb? U, what you mean by catalog bump? Anyway, messing with files in the base directory is a bad idea in general, and I don't think that's a reason to treat the pg_stat directory differently. If you remove it by hand, you'll be rightfully punished by various errors. A review: It applies cleanly (some offsets, no fuzz), builds without warnings, and passes make check including with cassert. The final test done in make check inherently tests this code, and it passes. If I intentionally break the patch by making pgstat_read_db_statsfile add one to the oid it opens, then the test fails. So the existing test is at least plausible as a test. doc/src/sgml/monitoring.sgml needs to be changed: a permanent copy of the statistics data is stored in the global subdirectory. I'm not aware of any other needed changes to the docs. Yeah, that should be in the global/pg_stat subdirectory. The big question is whether we want this. I think we do. While having hundreds of databases in a cluster is not recommended, that is no reason not to handle it better than we do. I don't see any down-sides, other than possibly some code uglification. Some file systems might not deal well with having lots of small stats files being rapidly written and rewritten, but it is hard to see how the current behavior would be more favorable for those systems. If the filesystem has issues with that many entries, it's already hosed by contents of the base directory (one per directory) or in the database directories (multiple files per table). Moreover, it's still possible to use tmpfs to handle this at runtime (which is often the recommended solution with the current code), and use the actual filesystem only for keeping the data across restarts. We do not already have this. There is no relevant spec. I can't see how this could need pg_dump support (but what about pg_upgrade?) pg_dump - no pg_upgrage - IMHO it should create the pg_stat directory. I don't think it could convert statfile into the new format (by splitting it into the pieces). I haven't checked but I believe the default behavior is to delete it as there might be new fields / slight changes of meaning etc. I am not aware of any dangers. I have a question about its completeness. When I first start up the cluster and have not yet touched it, there is very little stats collector activity, either with or without this patch. When I kick the cluster sufficiently (I've been using vacuumdb -a to do that) then there is a lot of stats collector activity. Even once the vacuumdb has long finished,
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
with the patch: vacuumdb -areal6m41.306s idle steady state: 7.86% user, 5.00% system 0.09% iowait 87% idle, Nice. Another interesting numbers would be device utilization, average I/O speed and required space (which should be ~2x the pgstat.stat size without the patch). this point is important - with large warehouse with lot of databases and tables you have move stat file to some ramdisc - without it you lost lot of IO capacity - and it is very important if you need only half sized ramdisc Regards Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra t...@fuzzy.cz wrote: On 3.1.2013 20:33, Magnus Hagander wrote: Yeah, +1 for a separate directory not in global. OK, I moved the files from global/stat to stat. Why stat rather than pg_stat? The existence of global and base as exceptions already annoys me. (Especially when I do a tar -xf in my home directory without remembering the -C flag). Unless there is some unstated rule behind what gets a pg_ and what doesn't, I think we should have the pg_. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On Sat, Feb 2, 2013 at 2:33 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra t...@fuzzy.cz wrote: On 3.1.2013 20:33, Magnus Hagander wrote: Yeah, +1 for a separate directory not in global. OK, I moved the files from global/stat to stat. This has a warning: pgstat.c:5132: warning: 'pgstat_write_statsfile_needed' was used with no prototype before its definition I plan to do some performance testing, but that will take a while so I wanted to post this before I get distracted. Running vacuumdb -a on a cluster with 1000 db with 200 tables (x serial primary key) in each, I get log messages like this: last_statwrite 23682-06-18 22:36:52.960194-07 is later than collector's time 2013-02-03 12:49:19.700629-08 for db 16387 Note the bizarre year in the first time stamp. If it matters, I got this after shutting down the cluster, blowing away $DATA/stat/*, then restarting it and invoking vacuumdb. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On 3.2.2013 20:46, Jeff Janes wrote: On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra t...@fuzzy.cz wrote: On 3.1.2013 20:33, Magnus Hagander wrote: Yeah, +1 for a separate directory not in global. OK, I moved the files from global/stat to stat. Why stat rather than pg_stat? The existence of global and base as exceptions already annoys me. (Especially when I do a tar -xf in my home directory without remembering the -C flag). Unless there is some unstated rule behind what gets a pg_ and what doesn't, I think we should have the pg_. I don't think there's a clear naming rule. But I think your suggestion makes perfect sense, especially because we have pg_stat_tmp directory. So now we'd have pg_stat and pg_stat_tmp, which is quite elegant. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On 2.2.2013 23:33, Jeff Janes wrote: On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra t...@fuzzy.cz wrote: On 3.1.2013 20:33, Magnus Hagander wrote: Yeah, +1 for a separate directory not in global. OK, I moved the files from global/stat to stat. This has a warning: pgstat.c:5132: warning: 'pgstat_write_statsfile_needed' was used with no prototype before its definition I forgot to add void into the method prototype ... Thanks! Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On 3.2.2013 21:54, Jeff Janes wrote: On Sat, Feb 2, 2013 at 2:33 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra t...@fuzzy.cz wrote: On 3.1.2013 20:33, Magnus Hagander wrote: Yeah, +1 for a separate directory not in global. OK, I moved the files from global/stat to stat. This has a warning: pgstat.c:5132: warning: 'pgstat_write_statsfile_needed' was used with no prototype before its definition I plan to do some performance testing, but that will take a while so I wanted to post this before I get distracted. Running vacuumdb -a on a cluster with 1000 db with 200 tables (x serial primary key) in each, I get log messages like this: last_statwrite 23682-06-18 22:36:52.960194-07 is later than collector's time 2013-02-03 12:49:19.700629-08 for db 16387 Note the bizarre year in the first time stamp. If it matters, I got this after shutting down the cluster, blowing away $DATA/stat/*, then restarting it and invoking vacuumdb. I somehow expected that hash_search zeroes all the fields of a new entry, but looking at pgstat_get_db_entry that obviously is not the case. So stats_timestamp (which tracks timestamp of the last write for a DB) was random - that's where the bizarre year values came from. I've added a proper initialization (to 0), and now it works as expected. Although the whole sequence of errors I was getting was this: LOG: last_statwrite 11133-08-28 19:22:31.711744+02 is later than collector's time 2013-02-04 00:54:21.113439+01 for db 19093 WARNING: pgstat wait timeout LOG: last_statwrite 39681-12-23 18:48:48.9093+01 is later than collector's time 2013-02-04 00:54:31.424681+01 for db 46494 FATAL: could not find block containing chunk 0x2af4a60 LOG: statistics collector process (PID 10063) exited with exit code 1 WARNING: pgstat wait timeout WARNING: pgstat wait timeout I'm not entirely sure where the FATAL came from, but it seems it was somehow related to the issue - it was quite reproducible, although I don't see how exactly could this happen. There relevant block of code looks like this: char *writetime; char *mytime; /* Copy because timestamptz_to_str returns a static buffer */ writetime = pstrdup(timestamptz_to_str(dbentry-stats_timestamp)); mytime = pstrdup(timestamptz_to_str(cur_ts)); elog(LOG, last_statwrite %s is later than collector's time %s for db %d, writetime, mytime, dbentry-databaseid); pfree(writetime); pfree(mytime); which seems quite fine to mee. I'm not sure how one of the pfree calls could fail? Anyway, attached is a patch that fixes all three issues, i.e. 1) the un-initialized timestamp 2) the void ommited from the signature 3) rename to pg_stat instead of just stat Tomas diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index d318db9..6d0efe9 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -64,10 +64,14 @@ /* -- * Paths for the statistics files (relative to installation's $PGDATA). + * Permanent and temprorary, global and per-database files. * -- */ -#define PGSTAT_STAT_PERMANENT_FILENAME global/pgstat.stat -#define PGSTAT_STAT_PERMANENT_TMPFILE global/pgstat.tmp +#define PGSTAT_STAT_PERMANENT_DIRECTORY pg_stat +#define PGSTAT_STAT_PERMANENT_FILENAME pg_stat/global.stat +#define PGSTAT_STAT_PERMANENT_TMPFILE pg_stat/global.tmp +#define PGSTAT_STAT_PERMANENT_DB_FILENAME pg_stat/%d.stat +#define PGSTAT_STAT_PERMANENT_DB_TMPFILE pg_stat/%d.tmp /* -- * Timer definitions. @@ -115,8 +119,11 @@ int pgstat_track_activity_query_size = 1024; * Built from GUC parameter * -- */ +char *pgstat_stat_directory = NULL; char *pgstat_stat_filename = NULL; char *pgstat_stat_tmpname = NULL; +char *pgstat_stat_db_filename = NULL; +char *pgstat_stat_db_tmpname = NULL; /* * BgWriter global statistics counters (unused in other processes). @@ -219,11 +226,16 @@ static int localNumBackends = 0; */ static PgStat_GlobalStats globalStats; -/* Last time the collector successfully wrote the stats file */ -static TimestampTz last_statwrite; +/* Write request info for each database */ +typedef struct DBWriteRequest +{ + Oid databaseid; /* OID of the database to write */ + TimestampTz request_time; /* timestamp of the last write request */ +} DBWriteRequest; -/* Latest statistics request time from backends */ -static TimestampTz last_statrequest; +/* Latest statistics request time from backends for each DB */ +static DBWriteRequest * last_statrequests = NULL; +static int num_statrequests = 0; static volatile bool need_exit = false; static volatile bool got_SIGHUP = false; @@ -252,11 +264,17 @@ static void pgstat_sighup_handler(SIGNAL_ARGS); static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create); static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid,
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On 1.2.2013 17:19, Alvaro Herrera wrote: Tomas Vondra wrote: diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index be3adf1..4ec485e 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -64,10 +64,14 @@ /* -- * Paths for the statistics files (relative to installation's $PGDATA). + * Permanent and temprorary, global and per-database files. Note typo in the line above. -#define PGSTAT_STAT_PERMANENT_FILENAME global/pgstat.stat -#define PGSTAT_STAT_PERMANENT_TMPFILE global/pgstat.tmp +#define PGSTAT_STAT_PERMANENT_DIRECTORY stat +#define PGSTAT_STAT_PERMANENT_FILENAME stat/global.stat +#define PGSTAT_STAT_PERMANENT_TMPFILE stat/global.tmp +#define PGSTAT_STAT_PERMANENT_DB_FILENAME stat/%d.stat +#define PGSTAT_STAT_PERMANENT_DB_TMPFILEstat/%d.tmp +char *pgstat_stat_directory = NULL; char *pgstat_stat_filename = NULL; char *pgstat_stat_tmpname = NULL; +char *pgstat_stat_db_filename = NULL; +char *pgstat_stat_db_tmpname = NULL; I don't like the quoted parts very much; it seems awkward to have the snprintf patterns in one place and have them be used in very distant I don't see that as particularly awkward, but that's a matter of taste. I still see that as a bunch of constants that are sprintf patterns at the same time. places. Is there a way to improve that? Also, if I understand clearly, the pgstat_stat_db_filename value needs to be an snprintf pattern too, right? What if it doesn't contain the required % specifier? U, yes - it needs to be a pattern too, but the user specifies the directory (stats_temp_directory) and this is used to derive all the other values - see assign_pgstat_temp_directory() in guc.c. Also, if you can filter this through pgindent, that would be best. Make sure to add DBWriteRequest to src/tools/pgindent/typedefs_list. Will do. +/* + * There's no request for this DB yet, so lets create it (allocate a + * space for it, set the values). + */ +if (last_statrequests == NULL) +last_statrequests = palloc(sizeof(DBWriteRequest)); +else +last_statrequests = repalloc(last_statrequests, + (num_statrequests + 1)*sizeof(DBWriteRequest)); + +last_statrequests[num_statrequests].databaseid = msg-databaseid; +last_statrequests[num_statrequests].request_time = msg-clock_time; +num_statrequests += 1; Having to repalloc this array each time seems wrong. Would a list instead of an array help? see ilist.c/h; I vote for a dlist because you can easily delete elements from the middle of it, if required (I think you'd need that.) Thanks. I'm not very familiar with the list interface, so I've used plain array. But yes, there are better ways than doing repalloc all the time. +char db_statfile[strlen(pgstat_stat_db_filename) + 11]; +snprintf(db_statfile, strlen(pgstat_stat_db_filename) + 11, + pgstat_stat_filename, dbentry-databaseid); This pattern seems rather frequent. Can we use a macro or similar here? Encapsulating the 11 better would be good. Magic numbers are evil. Yes, this needs to be cleaned / improved. diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 613c1c2..b3467d2 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -205,6 +205,7 @@ typedef struct PgStat_MsgInquiry PgStat_MsgHdr m_hdr; TimestampTz clock_time; /* observed local clock time */ TimestampTz cutoff_time;/* minimum acceptable file timestamp */ +Oid databaseid; /* requested DB (InvalidOid = all DBs) */ } PgStat_MsgInquiry; Do we need to support the case that somebody requests stuff from the shared DB? IIRC that's what InvalidOid means in pgstat ... Frankly, I don't know, but I guess we do because it was in the original code, and there are such inquiries right after the database starts (that's why I had to add pgstat_write_db_dummyfile). Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra t...@fuzzy.cz wrote: On 3.1.2013 20:33, Magnus Hagander wrote: Yeah, +1 for a separate directory not in global. OK, I moved the files from global/stat to stat. This has a warning: pgstat.c:5132: warning: 'pgstat_write_statsfile_needed' was used with no prototype before its definition I plan to do some performance testing, but that will take a while so I wanted to post this before I get distracted. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Ok, thanks for the info. I'll look into that and I'll also post info from some of our production systems (we've deployed a 9.1-backpatched version about 2 weeks ago). T. Původní zpráva Od: Jeff Janes jeff.ja...@gmail.com Datum: Komu: Tomas Vondra t...@fuzzy.cz Kopie: pgsql-hackers@postgresql.org Předmět: Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra t...@fuzzy.cz wrote: On 3.1.2013 20:33, Magnus Hagander wrote: Yeah, +1 for a separate directory not in global. OK, I moved the files from global/stat to stat. This has a warning: pgstat.c:5132: warning: 'pgstat_write_statsfile_needed' was used with no prototype before its definition I plan to do some performance testing, but that will take a while so I wanted to post this before I get distracted. Cheers, Jeff
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
Tomas Vondra wrote: diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index be3adf1..4ec485e 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -64,10 +64,14 @@ /* -- * Paths for the statistics files (relative to installation's $PGDATA). + * Permanent and temprorary, global and per-database files. Note typo in the line above. -#define PGSTAT_STAT_PERMANENT_FILENAME global/pgstat.stat -#define PGSTAT_STAT_PERMANENT_TMPFILEglobal/pgstat.tmp +#define PGSTAT_STAT_PERMANENT_DIRECTORY stat +#define PGSTAT_STAT_PERMANENT_FILENAME stat/global.stat +#define PGSTAT_STAT_PERMANENT_TMPFILEstat/global.tmp +#define PGSTAT_STAT_PERMANENT_DB_FILENAMEstat/%d.stat +#define PGSTAT_STAT_PERMANENT_DB_TMPFILE stat/%d.tmp +char*pgstat_stat_directory = NULL; char*pgstat_stat_filename = NULL; char*pgstat_stat_tmpname = NULL; +char*pgstat_stat_db_filename = NULL; +char*pgstat_stat_db_tmpname = NULL; I don't like the quoted parts very much; it seems awkward to have the snprintf patterns in one place and have them be used in very distant places. Is there a way to improve that? Also, if I understand clearly, the pgstat_stat_db_filename value needs to be an snprintf pattern too, right? What if it doesn't contain the required % specifier? Also, if you can filter this through pgindent, that would be best. Make sure to add DBWriteRequest to src/tools/pgindent/typedefs_list. + /* + * There's no request for this DB yet, so lets create it (allocate a + * space for it, set the values). + */ + if (last_statrequests == NULL) + last_statrequests = palloc(sizeof(DBWriteRequest)); + else + last_statrequests = repalloc(last_statrequests, + (num_statrequests + 1)*sizeof(DBWriteRequest)); + + last_statrequests[num_statrequests].databaseid = msg-databaseid; + last_statrequests[num_statrequests].request_time = msg-clock_time; + num_statrequests += 1; Having to repalloc this array each time seems wrong. Would a list instead of an array help? see ilist.c/h; I vote for a dlist because you can easily delete elements from the middle of it, if required (I think you'd need that.) + char db_statfile[strlen(pgstat_stat_db_filename) + 11]; + snprintf(db_statfile, strlen(pgstat_stat_db_filename) + 11, + pgstat_stat_filename, dbentry-databaseid); This pattern seems rather frequent. Can we use a macro or similar here? Encapsulating the 11 better would be good. Magic numbers are evil. diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 613c1c2..b3467d2 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -205,6 +205,7 @@ typedef struct PgStat_MsgInquiry PgStat_MsgHdr m_hdr; TimestampTz clock_time; /* observed local clock time */ TimestampTz cutoff_time;/* minimum acceptable file timestamp */ + Oid databaseid; /* requested DB (InvalidOid = all DBs) */ } PgStat_MsgInquiry; Do we need to support the case that somebody requests stuff from the shared DB? IIRC that's what InvalidOid means in pgstat ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On 3.1.2013 20:33, Magnus Hagander wrote: On Thu, Jan 3, 2013 at 8:31 PM, Tomas Vondra t...@fuzzy.cz wrote: On 3.1.2013 18:47, Heikki Linnakangas wrote: How about creating the new directory as a direct subdir of $PGDATA, rather than buried in global? global is supposed to contain data related to shared catalog relations (plus pg_control), so it doesn't seem like the right location for per-database stat files. Also, if we're going to have admins manually zapping the directory (hopefully when the system is offline), that's less scary if the directory is not buried as deep. That's clearly possible and it's a trivial change. I was thinking about that actually, but then I placed the directory into global because that's where the pgstat.stat originally was. Yeah, +1 for a separate directory not in global. OK, I moved the files from global/stat to stat. Tomas diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index be3adf1..4ec485e 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -64,10 +64,14 @@ /* -- * Paths for the statistics files (relative to installation's $PGDATA). + * Permanent and temprorary, global and per-database files. * -- */ -#define PGSTAT_STAT_PERMANENT_FILENAME global/pgstat.stat -#define PGSTAT_STAT_PERMANENT_TMPFILE global/pgstat.tmp +#define PGSTAT_STAT_PERMANENT_DIRECTORYstat +#define PGSTAT_STAT_PERMANENT_FILENAME stat/global.stat +#define PGSTAT_STAT_PERMANENT_TMPFILE stat/global.tmp +#define PGSTAT_STAT_PERMANENT_DB_FILENAME stat/%d.stat +#define PGSTAT_STAT_PERMANENT_DB_TMPFILE stat/%d.tmp /* -- * Timer definitions. @@ -115,8 +119,11 @@ int pgstat_track_activity_query_size = 1024; * Built from GUC parameter * -- */ +char *pgstat_stat_directory = NULL; char *pgstat_stat_filename = NULL; char *pgstat_stat_tmpname = NULL; +char *pgstat_stat_db_filename = NULL; +char *pgstat_stat_db_tmpname = NULL; /* * BgWriter global statistics counters (unused in other processes). @@ -219,11 +226,16 @@ static intlocalNumBackends = 0; */ static PgStat_GlobalStats globalStats; -/* Last time the collector successfully wrote the stats file */ -static TimestampTz last_statwrite; +/* Write request info for each database */ +typedef struct DBWriteRequest +{ + Oid databaseid; /* OID of the database to write */ + TimestampTz request_time; /* timestamp of the last write request */ +} DBWriteRequest; -/* Latest statistics request time from backends */ -static TimestampTz last_statrequest; +/* Latest statistics request time from backends for each DB */ +static DBWriteRequest * last_statrequests = NULL; +static int num_statrequests = 0; static volatile bool need_exit = false; static volatile bool got_SIGHUP = false; @@ -252,11 +264,17 @@ static void pgstat_sighup_handler(SIGNAL_ARGS); static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create); static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create); -static void pgstat_write_statsfile(bool permanent); -static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent); +static void pgstat_write_statsfile(bool permanent, bool force); +static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent); +static void pgstat_write_db_dummyfile(Oid databaseid); +static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs); +static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent); static void backend_read_statsfile(void); static void pgstat_read_current_status(void); +static bool pgstat_write_statsfile_needed(); +static bool pgstat_db_requested(Oid databaseid); + static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg); static void pgstat_send_funcstats(void); static HTAB *pgstat_collect_oids(Oid catalogid); @@ -285,7 +303,6 @@ static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len); static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len); - /* * Public functions called from postmaster follow * @@ -549,8 +566,34 @@ startup_failed: void pgstat_reset_all(void) { - unlink(pgstat_stat_filename); - unlink(PGSTAT_STAT_PERMANENT_FILENAME); + DIR * dir; + struct dirent * entry; + + dir = AllocateDir(pgstat_stat_directory); + while ((entry = ReadDir(dir, pgstat_stat_directory)) != NULL) + { + char fname[strlen(pgstat_stat_directory) + strlen(entry-d_name) + 1]; + + if
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On 03.01.2013 01:15, Tomas Vondra wrote: 2) a new global/stat directory -- The pgstat.stat file was originally saved into the global directory, but with so many files that would get rather messy so I've created a new global/stat directory and all the files are stored there. This also means we can do a simple delete files in the dir when pgstat_reset_all is called. How about creating the new directory as a direct subdir of $PGDATA, rather than buried in global? global is supposed to contain data related to shared catalog relations (plus pg_control), so it doesn't seem like the right location for per-database stat files. Also, if we're going to have admins manually zapping the directory (hopefully when the system is offline), that's less scary if the directory is not buried as deep. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On 3.1.2013 18:47, Heikki Linnakangas wrote: On 03.01.2013 01:15, Tomas Vondra wrote: 2) a new global/stat directory -- The pgstat.stat file was originally saved into the global directory, but with so many files that would get rather messy so I've created a new global/stat directory and all the files are stored there. This also means we can do a simple delete files in the dir when pgstat_reset_all is called. How about creating the new directory as a direct subdir of $PGDATA, rather than buried in global? global is supposed to contain data related to shared catalog relations (plus pg_control), so it doesn't seem like the right location for per-database stat files. Also, if we're going to have admins manually zapping the directory (hopefully when the system is offline), that's less scary if the directory is not buried as deep. That's clearly possible and it's a trivial change. I was thinking about that actually, but then I placed the directory into global because that's where the pgstat.stat originally was. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On Thu, Jan 3, 2013 at 8:31 PM, Tomas Vondra t...@fuzzy.cz wrote: On 3.1.2013 18:47, Heikki Linnakangas wrote: On 03.01.2013 01:15, Tomas Vondra wrote: 2) a new global/stat directory -- The pgstat.stat file was originally saved into the global directory, but with so many files that would get rather messy so I've created a new global/stat directory and all the files are stored there. This also means we can do a simple delete files in the dir when pgstat_reset_all is called. How about creating the new directory as a direct subdir of $PGDATA, rather than buried in global? global is supposed to contain data related to shared catalog relations (plus pg_control), so it doesn't seem like the right location for per-database stat files. Also, if we're going to have admins manually zapping the directory (hopefully when the system is offline), that's less scary if the directory is not buried as deep. That's clearly possible and it's a trivial change. I was thinking about that actually, but then I placed the directory into global because that's where the pgstat.stat originally was. Yeah, +1 for a separate directory not in global. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers