Re: [HACKERS] mat views stats

2017-03-19 Thread Jim Mlodgenski
>
>
> But that seems pretty ugly.  Given the lack of previous reports, I'm
> personally content to leave this unfixed in the back branches.
>
> Comments?
>
> Most instances of this I've seen out in the field have worked around this
by just running analyze in the scheduled jobs after the refresh  so we're
probably good not back patching.


Re: [HACKERS] mat views stats

2017-03-18 Thread Tom Lane
Jim Mlodgenski  writes:
> After digging into things further, just making refresh report the stats
> for what is it basically doing simplifies and solves it and it is
> something we can back patch if that the consensus. See the attached
> patch.

I've pushed this into HEAD with one non-cosmetic change: the patch tried
to pass a uint64 tuple count to pgstat_count_heap_insert(), whose argument
was only declared as "int".  This would go seriously wrong with matviews
having more than INT_MAX rows, which hardly seems out of the question,
so I changed pgstat_count_heap_insert() to take int64 instead.

I don't think we can make that change in the back branches though.
It seems too likely that third-party code might be calling 
pgstat_count_heap_insert().

We could possibly kluge around this to produce a safe-to-back-patch
fix by doing something like

pgstat_count_heap_insert(matviewRel, (int) Min(processed, INT_MAX));

But that seems pretty ugly.  Given the lack of previous reports, I'm
personally content to leave this unfixed in the back branches.

Comments?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mat views stats

2017-03-04 Thread Jim Mlodgenski
On Wed, Mar 1, 2017 at 8:39 PM, Michael Paquier 
wrote:

> On Thu, Mar 2, 2017 at 7:20 AM, Jim Mlodgenski  wrote:
> >
> >
> > On Sun, Feb 26, 2017 at 11:49 AM, Robert Haas 
> wrote:
> >>
> >> On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby 
> >> wrote:
> >> > Certainly easier, but I don't think it'd be better. Matviews really
> >> > aren't
> >> > the same thing as tables. Off-hand (without reviewing the patch),
> update
> >> > and
> >> > delete counts certainly wouldn't make any sense. "Insert" counts
> might,
> >> > in
> >> > as much as it's how many rows have been added by refreshes. You'd
> want a
> >> > refresh count too.
> >>
> >> Regular REFRESH truncates the view and repopulates it, but REFRESH
> >> CONCURRENTLY does inserts, updates, and deletes as needed to adjust
> >> the contrs that make sense for
> >> regular tables are also sensible here.
> >>
> >
> > After digging into things further, just making refresh report the stats
> for
> > what is it basically doing simplifies and solves it and it is something
> we
> > can back patch if that the consensus. See the attached patch.
>
> This is unhappy:
> $ git diff master --check
> src/backend/commands/matview.c:155: indent with spaces.
> +uint64  processed = 0;
>
> +/*
> + * Send the stats to mimic what we are essentially doing.
> + * A truncate and insert
> + */
> This sentence is unfinished.
>
> There is also no need to report the number of inserts if WITH NO DATA is
> used.
>


Here is the cleaned up patch
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index a18c917..94a69dd 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -30,6 +30,7 @@
 #include "executor/spi.h"
 #include "miscadmin.h"
 #include "parser/parse_relation.h"
+#include "pgstat.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/lmgr.h"
 #include "storage/smgr.h"
@@ -59,7 +60,7 @@ static void transientrel_startup(DestReceiver *self, int operation, TupleDesc ty
 static bool transientrel_receive(TupleTableSlot *slot, DestReceiver *self);
 static void transientrel_shutdown(DestReceiver *self);
 static void transientrel_destroy(DestReceiver *self);
-static void refresh_matview_datafill(DestReceiver *dest, Query *query,
+static uint64 refresh_matview_datafill(DestReceiver *dest, Query *query,
 		 const char *queryString);
 
 static char *make_temptable_name_n(char *tempname, int n);
@@ -151,6 +152,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	uint64  processed = 0;
 	ObjectAddress address;
 
 	/* Determine strength of lock needed. */
@@ -322,7 +324,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 
 	/* Generate the data, if wanted. */
 	if (!stmt->skipData)
-		refresh_matview_datafill(dest, dataQuery, queryString);
+		processed = refresh_matview_datafill(dest, dataQuery, queryString);
 
 	heap_close(matviewRel, NoLock);
 
@@ -345,8 +347,18 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 		Assert(matview_maintenance_depth == old_depth);
 	}
 	else
+	{
 		refresh_by_heap_swap(matviewOid, OIDNewHeap, relpersistence);
 
+		/*
+		 * Send the stats to mimic what we are essentially doing. Swapping the heap
+		 * is equilivant to truncating the relation and inserting the new data.
+		 */
+		pgstat_count_truncate(matviewRel);
+		if (!stmt->skipData)
+			pgstat_count_heap_insert(matviewRel, processed);
+	}
+
 	/* Roll back any GUC changes */
 	AtEOXact_GUC(false, save_nestlevel);
 
@@ -361,7 +373,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 /*
  * refresh_matview_datafill
  */
-static void
+static uint64
 refresh_matview_datafill(DestReceiver *dest, Query *query,
 		 const char *queryString)
 {
@@ -369,6 +381,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	PlannedStmt *plan;
 	QueryDesc  *queryDesc;
 	Query	   *copied_query;
+	uint64 processed;
 
 	/* Lock and rewrite, using a copy to preserve the original query. */
 	copied_query = copyObject(query);
@@ -406,6 +419,8 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	/* run the plan */
 	ExecutorRun(queryDesc, ForwardScanDirection, 0L);
 
+	processed = queryDesc->estate->es_processed;
+
 	/* and clean up */
 	ExecutorFinish(queryDesc);
 	ExecutorEnd(queryDesc);
@@ -413,6 +428,8 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	FreeQueryDesc(queryDesc);
 
 	PopActiveSnapshot();
+
+	return processed;
 }
 
 DestReceiver *

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mat views stats

2017-03-01 Thread Michael Paquier
On Thu, Mar 2, 2017 at 7:20 AM, Jim Mlodgenski  wrote:
>
>
> On Sun, Feb 26, 2017 at 11:49 AM, Robert Haas  wrote:
>>
>> On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby 
>> wrote:
>> > Certainly easier, but I don't think it'd be better. Matviews really
>> > aren't
>> > the same thing as tables. Off-hand (without reviewing the patch), update
>> > and
>> > delete counts certainly wouldn't make any sense. "Insert" counts might,
>> > in
>> > as much as it's how many rows have been added by refreshes. You'd want a
>> > refresh count too.
>>
>> Regular REFRESH truncates the view and repopulates it, but REFRESH
>> CONCURRENTLY does inserts, updates, and deletes as needed to adjust
>> the contrs that make sense for
>> regular tables are also sensible here.
>>
>
> After digging into things further, just making refresh report the stats for
> what is it basically doing simplifies and solves it and it is something we
> can back patch if that the consensus. See the attached patch.

This is unhappy:
$ git diff master --check
src/backend/commands/matview.c:155: indent with spaces.
+uint64  processed = 0;

+/*
+ * Send the stats to mimic what we are essentially doing.
+ * A truncate and insert
+ */
This sentence is unfinished.

There is also no need to report the number of inserts if WITH NO DATA is used.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mat views stats

2017-03-01 Thread Jim Mlodgenski
On Sun, Feb 26, 2017 at 11:49 AM, Robert Haas  wrote:

> On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby 
> wrote:
> > Certainly easier, but I don't think it'd be better. Matviews really
> aren't
> > the same thing as tables. Off-hand (without reviewing the patch), update
> and
> > delete counts certainly wouldn't make any sense. "Insert" counts might,
> in
> > as much as it's how many rows have been added by refreshes. You'd want a
> > refresh count too.
>
> Regular REFRESH truncates the view and repopulates it, but REFRESH
> CONCURRENTLY does inserts, updates, and deletes as needed to adjust
> the contents.  So I think all the same counters that make sense for
> regular tables are also sensible here.
>
>
After digging into things further, just making refresh report the stats for
what is it basically doing simplifies and solves it and it is something we
can back patch if that the consensus. See the attached patch.
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index a18c917..4383312 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -30,6 +30,7 @@
 #include "executor/spi.h"
 #include "miscadmin.h"
 #include "parser/parse_relation.h"
+#include "pgstat.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/lmgr.h"
 #include "storage/smgr.h"
@@ -59,7 +60,7 @@ static void transientrel_startup(DestReceiver *self, int operation, TupleDesc ty
 static bool transientrel_receive(TupleTableSlot *slot, DestReceiver *self);
 static void transientrel_shutdown(DestReceiver *self);
 static void transientrel_destroy(DestReceiver *self);
-static void refresh_matview_datafill(DestReceiver *dest, Query *query,
+static uint64 refresh_matview_datafill(DestReceiver *dest, Query *query,
 		 const char *queryString);
 
 static char *make_temptable_name_n(char *tempname, int n);
@@ -151,6 +152,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+uint64  processed = 0;
 	ObjectAddress address;
 
 	/* Determine strength of lock needed. */
@@ -322,7 +324,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 
 	/* Generate the data, if wanted. */
 	if (!stmt->skipData)
-		refresh_matview_datafill(dest, dataQuery, queryString);
+		processed = refresh_matview_datafill(dest, dataQuery, queryString);
 
 	heap_close(matviewRel, NoLock);
 
@@ -345,8 +347,17 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 		Assert(matview_maintenance_depth == old_depth);
 	}
 	else
+{
 		refresh_by_heap_swap(matviewOid, OIDNewHeap, relpersistence);
 
+/* 
+ * Send the stats to mimic what we are essentially doing. 
+ * A truncate and insert 
+ */
+pgstat_count_truncate(matviewRel);
+pgstat_count_heap_insert(matviewRel, processed);
+}
+
 	/* Roll back any GUC changes */
 	AtEOXact_GUC(false, save_nestlevel);
 
@@ -361,7 +372,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 /*
  * refresh_matview_datafill
  */
-static void
+static uint64 
 refresh_matview_datafill(DestReceiver *dest, Query *query,
 		 const char *queryString)
 {
@@ -369,6 +380,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	PlannedStmt *plan;
 	QueryDesc  *queryDesc;
 	Query	   *copied_query;
+uint64 processed;
 
 	/* Lock and rewrite, using a copy to preserve the original query. */
 	copied_query = copyObject(query);
@@ -406,6 +418,8 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	/* run the plan */
 	ExecutorRun(queryDesc, ForwardScanDirection, 0L);
 
+processed = queryDesc->estate->es_processed;
+
 	/* and clean up */
 	ExecutorFinish(queryDesc);
 	ExecutorEnd(queryDesc);
@@ -413,6 +427,8 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	FreeQueryDesc(queryDesc);
 
 	PopActiveSnapshot();
+
+return processed;
 }
 
 DestReceiver *

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mat views stats

2017-02-26 Thread Robert Haas
On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby  wrote:
> Certainly easier, but I don't think it'd be better. Matviews really aren't
> the same thing as tables. Off-hand (without reviewing the patch), update and
> delete counts certainly wouldn't make any sense. "Insert" counts might, in
> as much as it's how many rows have been added by refreshes. You'd want a
> refresh count too.

Regular REFRESH truncates the view and repopulates it, but REFRESH
CONCURRENTLY does inserts, updates, and deletes as needed to adjust
the contents.  So I think all the same counters that make sense for
regular tables are also sensible here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mat views stats

2017-02-22 Thread Jim Nasby

On 2/22/17 7:56 AM, Peter Eisentraut wrote:

What behavior would we like by default?  Refreshing a materialized view
is a pretty expensive operation, so I think scheduling an analyze quite
aggressively right afterwards is often what you want.

I think sending a stats message with the number of inserted rows could
make sense.


+1 on both counts. And if sane analyze behavior does depend on the stats 
changes then there's no real advantage to a separate patch.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mat views stats

2017-02-22 Thread Peter Eisentraut
On 2/22/17 06:31, Jim Mlodgenski wrote:
> Matviews already show up in the pg_stat_*_tables and the patch does
> leverage the existing pg_stat_*_tables underlying structure, but it
> creates more meaningful pg_stat_*_matviews leaving out things like
> insert and update counts.  

But fields like seq_scans and last_analyze are then redundant between
the *_tables view and the *_matviews view.  Maybe it would make more
sense to introduce a new view like you propose and not show them in
*_tables anymore?

> I was originally thinking 2 patches, but I couldn't think of a way to
> trigger the analyze reliably without adding a refresh count or sending
> bogus stats. We can certainly send a stats message containing the number
> of rows inserted by the refresh, but are we going to also send the
> number of deletes as well? Consider a matview that has month to date
> data. At the end of the month, there will be about 30n live tuples. The
> next day on the new month, there will be n inserts with the stats
> thinking there are 30n live tuples which is below the analyze scale
> factor.  We want to analyze the matview on the first of the day of the
> new month, but it wouldn't be triggered for a few days. We can have
> REFRESH also track live tuples, but it was quickly becoming a slippery
> slope of changing behavior for a back patch. Maybe that's OK and we can
> go down that road.

For those not reading the patch, it introduces a new reloption
autovacuum_analyze_refresh_threshold that determines when to autoanalyze
a materialized view.

What behavior would we like by default?  Refreshing a materialized view
is a pretty expensive operation, so I think scheduling an analyze quite
aggressively right afterwards is often what you want.

I think sending a stats message with the number of inserted rows could
make sense.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mat views stats

2017-02-22 Thread Jim Mlodgenski
On Wed, Feb 22, 2017 at 12:43 AM, Jim Nasby 
wrote:

> On 2/21/17 4:22 PM, Peter Eisentraut wrote:
>
>> Attached is a patch to trigger autovacuum based on a matview refresh
>>> along with a system view pg_stat_all_matviews to show information more
>>> meaningful for materialized views.
>>>
>> It might be easier to include materialized views into pg_stat_*_tables.
>>
>
> Certainly easier, but I don't think it'd be better. Matviews really aren't
> the same thing as tables. Off-hand (without reviewing the patch), update
> and delete counts certainly wouldn't make any sense. "Insert" counts might,
> in as much as it's how many rows have been added by refreshes. You'd want a
> refresh count too.


Matviews already show up in the pg_stat_*_tables and the patch does
leverage the existing pg_stat_*_tables underlying structure, but it creates
more meaningful pg_stat_*_matviews leaving out things like insert and
update counts.


>
>
> I think these should be two separate patches.  We might want to
>> backpatch the first one.
>>
>
I was originally thinking 2 patches, but I couldn't think of a way to
trigger the analyze reliably without adding a refresh count or sending
bogus stats. We can certainly send a stats message containing the number of
rows inserted by the refresh, but are we going to also send the number of
deletes as well? Consider a matview that has month to date data. At the end
of the month, there will be about 30n live tuples. The next day on the new
month, there will be n inserts with the stats thinking there are 30n live
tuples which is below the analyze scale factor.  We want to analyze the
matview on the first of the day of the new month, but it wouldn't be
triggered for a few days. We can have REFRESH also track live tuples, but
it was quickly becoming a slippery slope of changing behavior for a back
patch. Maybe that's OK and we can go down that road.

We can back patch some documentation about the existing refresh behavior
with autovacuum.


Re: [HACKERS] mat views stats

2017-02-21 Thread Jim Nasby

On 2/21/17 4:22 PM, Peter Eisentraut wrote:

Attached is a patch to trigger autovacuum based on a matview refresh
along with a system view pg_stat_all_matviews to show information more
meaningful for materialized views.

It might be easier to include materialized views into pg_stat_*_tables.


Certainly easier, but I don't think it'd be better. Matviews really 
aren't the same thing as tables. Off-hand (without reviewing the patch), 
update and delete counts certainly wouldn't make any sense. "Insert" 
counts might, in as much as it's how many rows have been added by 
refreshes. You'd want a refresh count too.



I think these should be two separate patches.  We might want to
backpatch the first one.


+1; definitely sounds like a bug to me.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mat views stats

2017-02-21 Thread Peter Eisentraut
On 2/20/17 10:06, Jim Mlodgenski wrote:
> I've come across a number of times where the statistics on materialized
> views become stale producing bad plans. It turns out that autovacuum
> only touches a materialized view when it is first created and ignores it
> on a refresh. When you have a materialized view like yesterdays_sales
> the data in the materialized view turns over every day. 

That sounds like a bug.

> Attached is a patch to trigger autovacuum based on a matview refresh
> along with a system view pg_stat_all_matviews to show information more
> meaningful for materialized views.

It might be easier to include materialized views into pg_stat_*_tables.

I think these should be two separate patches.  We might want to
backpatch the first one.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers