Re: [HACKERS] mat views stats
> > > But that seems pretty ugly. Given the lack of previous reports, I'm > personally content to leave this unfixed in the back branches. > > Comments? > > Most instances of this I've seen out in the field have worked around this by just running analyze in the scheduled jobs after the refresh so we're probably good not back patching.
Re: [HACKERS] mat views stats
Jim Mlodgenski writes: > After digging into things further, just making refresh report the stats > for what is it basically doing simplifies and solves it and it is > something we can back patch if that the consensus. See the attached > patch. I've pushed this into HEAD with one non-cosmetic change: the patch tried to pass a uint64 tuple count to pgstat_count_heap_insert(), whose argument was only declared as "int". This would go seriously wrong with matviews having more than INT_MAX rows, which hardly seems out of the question, so I changed pgstat_count_heap_insert() to take int64 instead. I don't think we can make that change in the back branches though. It seems too likely that third-party code might be calling pgstat_count_heap_insert(). We could possibly kluge around this to produce a safe-to-back-patch fix by doing something like pgstat_count_heap_insert(matviewRel, (int) Min(processed, INT_MAX)); But that seems pretty ugly. Given the lack of previous reports, I'm personally content to leave this unfixed in the back branches. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mat views stats
On Wed, Mar 1, 2017 at 8:39 PM, Michael Paquier wrote: > On Thu, Mar 2, 2017 at 7:20 AM, Jim Mlodgenski wrote: > > > > > > On Sun, Feb 26, 2017 at 11:49 AM, Robert Haas > wrote: > >> > >> On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby > >> wrote: > >> > Certainly easier, but I don't think it'd be better. Matviews really > >> > aren't > >> > the same thing as tables. Off-hand (without reviewing the patch), > update > >> > and > >> > delete counts certainly wouldn't make any sense. "Insert" counts > might, > >> > in > >> > as much as it's how many rows have been added by refreshes. You'd > want a > >> > refresh count too. > >> > >> Regular REFRESH truncates the view and repopulates it, but REFRESH > >> CONCURRENTLY does inserts, updates, and deletes as needed to adjust > >> the contrs that make sense for > >> regular tables are also sensible here. > >> > > > > After digging into things further, just making refresh report the stats > for > > what is it basically doing simplifies and solves it and it is something > we > > can back patch if that the consensus. See the attached patch. > > This is unhappy: > $ git diff master --check > src/backend/commands/matview.c:155: indent with spaces. > +uint64 processed = 0; > > +/* > + * Send the stats to mimic what we are essentially doing. > + * A truncate and insert > + */ > This sentence is unfinished. > > There is also no need to report the number of inserts if WITH NO DATA is > used. > Here is the cleaned up patch diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c index a18c917..94a69dd 100644 --- a/src/backend/commands/matview.c +++ b/src/backend/commands/matview.c @@ -30,6 +30,7 @@ #include "executor/spi.h" #include "miscadmin.h" #include "parser/parse_relation.h" +#include "pgstat.h" #include "rewrite/rewriteHandler.h" #include "storage/lmgr.h" #include "storage/smgr.h" @@ -59,7 +60,7 @@ static void transientrel_startup(DestReceiver *self, int operation, TupleDesc ty static bool transientrel_receive(TupleTableSlot *slot, DestReceiver *self); static void transientrel_shutdown(DestReceiver *self); static void transientrel_destroy(DestReceiver *self); -static void refresh_matview_datafill(DestReceiver *dest, Query *query, +static uint64 refresh_matview_datafill(DestReceiver *dest, Query *query, const char *queryString); static char *make_temptable_name_n(char *tempname, int n); @@ -151,6 +152,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, Oid save_userid; int save_sec_context; int save_nestlevel; + uint64 processed = 0; ObjectAddress address; /* Determine strength of lock needed. */ @@ -322,7 +324,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, /* Generate the data, if wanted. */ if (!stmt->skipData) - refresh_matview_datafill(dest, dataQuery, queryString); + processed = refresh_matview_datafill(dest, dataQuery, queryString); heap_close(matviewRel, NoLock); @@ -345,8 +347,18 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, Assert(matview_maintenance_depth == old_depth); } else + { refresh_by_heap_swap(matviewOid, OIDNewHeap, relpersistence); + /* + * Send the stats to mimic what we are essentially doing. Swapping the heap + * is equilivant to truncating the relation and inserting the new data. + */ + pgstat_count_truncate(matviewRel); + if (!stmt->skipData) + pgstat_count_heap_insert(matviewRel, processed); + } + /* Roll back any GUC changes */ AtEOXact_GUC(false, save_nestlevel); @@ -361,7 +373,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, /* * refresh_matview_datafill */ -static void +static uint64 refresh_matview_datafill(DestReceiver *dest, Query *query, const char *queryString) { @@ -369,6 +381,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query, PlannedStmt *plan; QueryDesc *queryDesc; Query *copied_query; + uint64 processed; /* Lock and rewrite, using a copy to preserve the original query. */ copied_query = copyObject(query); @@ -406,6 +419,8 @@ refresh_matview_datafill(DestReceiver *dest, Query *query, /* run the plan */ ExecutorRun(queryDesc, ForwardScanDirection, 0L); + processed = queryDesc->estate->es_processed; + /* and clean up */ ExecutorFinish(queryDesc); ExecutorEnd(queryDesc); @@ -413,6 +428,8 @@ refresh_matview_datafill(DestReceiver *dest, Query *query, FreeQueryDesc(queryDesc); PopActiveSnapshot(); + + return processed; } DestReceiver * -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mat views stats
On Thu, Mar 2, 2017 at 7:20 AM, Jim Mlodgenski wrote: > > > On Sun, Feb 26, 2017 at 11:49 AM, Robert Haas wrote: >> >> On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby >> wrote: >> > Certainly easier, but I don't think it'd be better. Matviews really >> > aren't >> > the same thing as tables. Off-hand (without reviewing the patch), update >> > and >> > delete counts certainly wouldn't make any sense. "Insert" counts might, >> > in >> > as much as it's how many rows have been added by refreshes. You'd want a >> > refresh count too. >> >> Regular REFRESH truncates the view and repopulates it, but REFRESH >> CONCURRENTLY does inserts, updates, and deletes as needed to adjust >> the contrs that make sense for >> regular tables are also sensible here. >> > > After digging into things further, just making refresh report the stats for > what is it basically doing simplifies and solves it and it is something we > can back patch if that the consensus. See the attached patch. This is unhappy: $ git diff master --check src/backend/commands/matview.c:155: indent with spaces. +uint64 processed = 0; +/* + * Send the stats to mimic what we are essentially doing. + * A truncate and insert + */ This sentence is unfinished. There is also no need to report the number of inserts if WITH NO DATA is used. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mat views stats
On Sun, Feb 26, 2017 at 11:49 AM, Robert Haas wrote: > On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby > wrote: > > Certainly easier, but I don't think it'd be better. Matviews really > aren't > > the same thing as tables. Off-hand (without reviewing the patch), update > and > > delete counts certainly wouldn't make any sense. "Insert" counts might, > in > > as much as it's how many rows have been added by refreshes. You'd want a > > refresh count too. > > Regular REFRESH truncates the view and repopulates it, but REFRESH > CONCURRENTLY does inserts, updates, and deletes as needed to adjust > the contents. So I think all the same counters that make sense for > regular tables are also sensible here. > > After digging into things further, just making refresh report the stats for what is it basically doing simplifies and solves it and it is something we can back patch if that the consensus. See the attached patch. diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c index a18c917..4383312 100644 --- a/src/backend/commands/matview.c +++ b/src/backend/commands/matview.c @@ -30,6 +30,7 @@ #include "executor/spi.h" #include "miscadmin.h" #include "parser/parse_relation.h" +#include "pgstat.h" #include "rewrite/rewriteHandler.h" #include "storage/lmgr.h" #include "storage/smgr.h" @@ -59,7 +60,7 @@ static void transientrel_startup(DestReceiver *self, int operation, TupleDesc ty static bool transientrel_receive(TupleTableSlot *slot, DestReceiver *self); static void transientrel_shutdown(DestReceiver *self); static void transientrel_destroy(DestReceiver *self); -static void refresh_matview_datafill(DestReceiver *dest, Query *query, +static uint64 refresh_matview_datafill(DestReceiver *dest, Query *query, const char *queryString); static char *make_temptable_name_n(char *tempname, int n); @@ -151,6 +152,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, Oid save_userid; int save_sec_context; int save_nestlevel; +uint64 processed = 0; ObjectAddress address; /* Determine strength of lock needed. */ @@ -322,7 +324,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, /* Generate the data, if wanted. */ if (!stmt->skipData) - refresh_matview_datafill(dest, dataQuery, queryString); + processed = refresh_matview_datafill(dest, dataQuery, queryString); heap_close(matviewRel, NoLock); @@ -345,8 +347,17 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, Assert(matview_maintenance_depth == old_depth); } else +{ refresh_by_heap_swap(matviewOid, OIDNewHeap, relpersistence); +/* + * Send the stats to mimic what we are essentially doing. + * A truncate and insert + */ +pgstat_count_truncate(matviewRel); +pgstat_count_heap_insert(matviewRel, processed); +} + /* Roll back any GUC changes */ AtEOXact_GUC(false, save_nestlevel); @@ -361,7 +372,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, /* * refresh_matview_datafill */ -static void +static uint64 refresh_matview_datafill(DestReceiver *dest, Query *query, const char *queryString) { @@ -369,6 +380,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query, PlannedStmt *plan; QueryDesc *queryDesc; Query *copied_query; +uint64 processed; /* Lock and rewrite, using a copy to preserve the original query. */ copied_query = copyObject(query); @@ -406,6 +418,8 @@ refresh_matview_datafill(DestReceiver *dest, Query *query, /* run the plan */ ExecutorRun(queryDesc, ForwardScanDirection, 0L); +processed = queryDesc->estate->es_processed; + /* and clean up */ ExecutorFinish(queryDesc); ExecutorEnd(queryDesc); @@ -413,6 +427,8 @@ refresh_matview_datafill(DestReceiver *dest, Query *query, FreeQueryDesc(queryDesc); PopActiveSnapshot(); + +return processed; } DestReceiver * -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mat views stats
On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby wrote: > Certainly easier, but I don't think it'd be better. Matviews really aren't > the same thing as tables. Off-hand (without reviewing the patch), update and > delete counts certainly wouldn't make any sense. "Insert" counts might, in > as much as it's how many rows have been added by refreshes. You'd want a > refresh count too. Regular REFRESH truncates the view and repopulates it, but REFRESH CONCURRENTLY does inserts, updates, and deletes as needed to adjust the contents. So I think all the same counters that make sense for regular tables are also sensible here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mat views stats
On 2/22/17 7:56 AM, Peter Eisentraut wrote: What behavior would we like by default? Refreshing a materialized view is a pretty expensive operation, so I think scheduling an analyze quite aggressively right afterwards is often what you want. I think sending a stats message with the number of inserted rows could make sense. +1 on both counts. And if sane analyze behavior does depend on the stats changes then there's no real advantage to a separate patch. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mat views stats
On 2/22/17 06:31, Jim Mlodgenski wrote: > Matviews already show up in the pg_stat_*_tables and the patch does > leverage the existing pg_stat_*_tables underlying structure, but it > creates more meaningful pg_stat_*_matviews leaving out things like > insert and update counts. But fields like seq_scans and last_analyze are then redundant between the *_tables view and the *_matviews view. Maybe it would make more sense to introduce a new view like you propose and not show them in *_tables anymore? > I was originally thinking 2 patches, but I couldn't think of a way to > trigger the analyze reliably without adding a refresh count or sending > bogus stats. We can certainly send a stats message containing the number > of rows inserted by the refresh, but are we going to also send the > number of deletes as well? Consider a matview that has month to date > data. At the end of the month, there will be about 30n live tuples. The > next day on the new month, there will be n inserts with the stats > thinking there are 30n live tuples which is below the analyze scale > factor. We want to analyze the matview on the first of the day of the > new month, but it wouldn't be triggered for a few days. We can have > REFRESH also track live tuples, but it was quickly becoming a slippery > slope of changing behavior for a back patch. Maybe that's OK and we can > go down that road. For those not reading the patch, it introduces a new reloption autovacuum_analyze_refresh_threshold that determines when to autoanalyze a materialized view. What behavior would we like by default? Refreshing a materialized view is a pretty expensive operation, so I think scheduling an analyze quite aggressively right afterwards is often what you want. I think sending a stats message with the number of inserted rows could make sense. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mat views stats
On Wed, Feb 22, 2017 at 12:43 AM, Jim Nasby wrote: > On 2/21/17 4:22 PM, Peter Eisentraut wrote: > >> Attached is a patch to trigger autovacuum based on a matview refresh >>> along with a system view pg_stat_all_matviews to show information more >>> meaningful for materialized views. >>> >> It might be easier to include materialized views into pg_stat_*_tables. >> > > Certainly easier, but I don't think it'd be better. Matviews really aren't > the same thing as tables. Off-hand (without reviewing the patch), update > and delete counts certainly wouldn't make any sense. "Insert" counts might, > in as much as it's how many rows have been added by refreshes. You'd want a > refresh count too. Matviews already show up in the pg_stat_*_tables and the patch does leverage the existing pg_stat_*_tables underlying structure, but it creates more meaningful pg_stat_*_matviews leaving out things like insert and update counts. > > > I think these should be two separate patches. We might want to >> backpatch the first one. >> > I was originally thinking 2 patches, but I couldn't think of a way to trigger the analyze reliably without adding a refresh count or sending bogus stats. We can certainly send a stats message containing the number of rows inserted by the refresh, but are we going to also send the number of deletes as well? Consider a matview that has month to date data. At the end of the month, there will be about 30n live tuples. The next day on the new month, there will be n inserts with the stats thinking there are 30n live tuples which is below the analyze scale factor. We want to analyze the matview on the first of the day of the new month, but it wouldn't be triggered for a few days. We can have REFRESH also track live tuples, but it was quickly becoming a slippery slope of changing behavior for a back patch. Maybe that's OK and we can go down that road. We can back patch some documentation about the existing refresh behavior with autovacuum.
Re: [HACKERS] mat views stats
On 2/21/17 4:22 PM, Peter Eisentraut wrote: Attached is a patch to trigger autovacuum based on a matview refresh along with a system view pg_stat_all_matviews to show information more meaningful for materialized views. It might be easier to include materialized views into pg_stat_*_tables. Certainly easier, but I don't think it'd be better. Matviews really aren't the same thing as tables. Off-hand (without reviewing the patch), update and delete counts certainly wouldn't make any sense. "Insert" counts might, in as much as it's how many rows have been added by refreshes. You'd want a refresh count too. I think these should be two separate patches. We might want to backpatch the first one. +1; definitely sounds like a bug to me. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mat views stats
On 2/20/17 10:06, Jim Mlodgenski wrote: > I've come across a number of times where the statistics on materialized > views become stale producing bad plans. It turns out that autovacuum > only touches a materialized view when it is first created and ignores it > on a refresh. When you have a materialized view like yesterdays_sales > the data in the materialized view turns over every day. That sounds like a bug. > Attached is a patch to trigger autovacuum based on a matview refresh > along with a system view pg_stat_all_matviews to show information more > meaningful for materialized views. It might be easier to include materialized views into pg_stat_*_tables. I think these should be two separate patches. We might want to backpatch the first one. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] mat views stats
I've come across a number of times where the statistics on materialized views become stale producing bad plans. It turns out that autovacuum only touches a materialized view when it is first created and ignores it on a refresh. When you have a materialized view like yesterdays_sales the data in the materialized view turns over every day. Attached is a patch to trigger autovacuum based on a matview refresh along with a system view pg_stat_all_matviews to show information more meaningful for materialized views. -- Jim diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index fad5cb0..ec27e2c 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -436,6 +436,21 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + pg_stat_all_matviewspg_stat_all_matviews + + One row for each materialized view in the current database, showing statistics + about accesses to that specific materialized view. + See for details. + + + + + pg_stat_user_matviewspg_stat_user_matviews + Same as pg_stat_all_matviews, except that only + user materialized views are shown. + + + pg_statio_all_tablespg_statio_all_tables One row for each table in the current database, showing statistics @@ -2277,6 +2292,97 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i + + pg_stat_all_matviews View + + + + Column + Type + Description + + + + + + relid + oid + OID of this materialize view + + + schemaname + name + Name of the schema that this materialized view is in + + + relname + name + Name of this materialized view + + + seq_scan + bigint + Number of sequential scans initiated on this materialized view + + + seq_tup_read + bigint + Number of live rows fetched by sequential scans + + + idx_scan + bigint + Number of index scans initiated on this materialized ciew + + + idx_tup_fetch + bigint + Number of live rows fetched by index scans + + + last_refresh + timestamp with time zone + Last time at which this materialized view was refreshed + + + refresh_count + bigint + Number of times this materialized view has been refreshed + + + last_analyze + timestamp with time zone + Last time at which this materialized view was manually analyzed + + + last_autoanalyze + timestamp with time zone + Last time at which this materialized ciew was analyzed by the + autovacuum daemon + + + analyze_count + bigint + Number of times this materialized view has been manually analyzed + + + autoanalyze_count + bigint + Number of times this materialized view has been analyzed by the + autovacuum daemon + + + + + + + The pg_stat_all_matviews view will contain + one row for each materialized view in the current database, showing + statistics about accesses to that specific materialized view. The + pg_stat_user_matviews contain the same + information, but filtered to only show user materialized views. + + pg_statio_all_tables View diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c index 816483e..8d60d48 100644 --- a/src/backend/access/common/reloptions.c +++ b/src/backend/access/common/reloptions.c @@ -164,6 +164,15 @@ static relopt_int intRelOpts[] = }, -1, 0, INT_MAX }, +{ +{ +"autovacuum_analyze_refresh_threshold", +"Minimum number of materialized view refreshes prior to analyze", +RELOPT_KIND_HEAP, +ShareUpdateExclusiveLock +}, +-1, 0, INT_MAX +}, { { "autovacuum_vacuum_cost_delay", @@ -1283,6 +1292,8 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind) offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, vacuum_threshold)}, {"autovacuum_analyze_threshold", RELOPT_TYPE_INT, offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, analyze_threshold)}, +{"autovacuum_analyze_refresh_threshold", RELOPT_TYPE_INT, +offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, analyze_ref_threshold)}, {"autovacuum_vacuum_cost_delay", RELOPT_TYPE_INT, offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, vacuum_cost_delay)}, {"autovacuum_vacuum_cost_limit", RELOPT_TYPE_INT, diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 38be9cf..07b9f1c 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -535,6 +535,28 @@ CREATE VIEW pg_stat_xact_all