Re: Updating MATERIALIZED VIEWs (Re: [HACKERS] delta relations in AFTER triggers)

2017-01-21 Thread Jim Nasby

On 1/20/17 5:38 PM, Nico Williams wrote:

If these triggers could be automatically generated, that sure would be
nice, but some control would be needed over when to update the MV vs.
mark it as needing a refresh.


FWIW, pg_classy[1], which is still a work in progress, would allow for 
that. The idea is that you define a code template which you can then 
call with arguments (such as the name of a matview table), and those 
arguments get put into the template before executing the resulting SQL.


Most of the basic framework for that is in place; I just need to finish 
code that will allow for the extension to track arbitrary database 
objects that were created.


1: https://github.com/decibel/pg_classy/blob/master/doc/pg_classy.asc
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Updating MATERIALIZED VIEWs (Re: [HACKERS] delta relations in AFTER triggers)

2017-01-20 Thread Nico Williams
[Looking at your patch I see that it's not quite related to MVs, so I'm
changing the Subject.  Apologies for the noise.]

[Responding out of order.]

On Fri, Jan 20, 2017 at 03:37:20PM -0600, Kevin Grittner wrote:
> On Fri, Jan 20, 2017 at 2:08 PM, Nico Williams  wrote:
> > BTW, automatic updates of certain types of MVs should be easy: add
> > constraints based on NEW/OLD rows from synthetic triggers to the
> > underlying query.
> 
> Convincing me that this is a good idea for actual MVs, versus
> pseudo-MVs using tables, would be an uphill battle.  [...]

I don't think it's necessary, and I would not insist on it.

My alternative MV implementation lets _me_ choose when to update an MV
synchronously, and when to defer refreshes, by using [hand-coded]
triggers.  This is good enough for me.

If these triggers could be automatically generated, that sure would be
nice, but some control would be needed over when to update the MV vs.
mark it as needing a refresh.

> > Our intention is to contribute this.  We're willing to sign
> > reasonable contribution agreements.
> 
> Posting a patch to these lists constitutes an assertion that you
> have authority to share the IP, and are doing so.  Referencing a
> URL is a bit iffy, since it doesn't leave an archival copy of the
> contribution under the community's control.

Fair enough.  I'll post the source file itself.  I've not done the work
of properly integrating it because I need to gauge interest first,
before dedicating a lot of effort to it.

> I am dubious, though, of the approach in general, as stated above.

I'm using this _now_.  With a caveat:

a) the trigger functions needed to either mark an MV as needing a
refresh, or else to update it directly, are hand-coded, and

b) I chose which operations yield synchronous MV updates and which defer
to a refresh.

The MV, in my scheme, is really just a table with triggers that update a
deltas table the same way that a refresh would.  A refresh locks the
table, disables those triggers, populates another table with the current
output of the underlying view, compares to the previous materialization,
and lastly generates, records, and applies deltas to the
materialization.

To give an example, adding a user to a group -> generally fast; deleting
a user (and thus all their group memberships) -> potentially very slow.

The "add a user to a group" case can then yield near real-time updates
of external caches, while the other case results in a deferred REFRESH
so as to not slow down the current transaction.  The deferred REFRESH is
not deferred too long, so the net effect is still very fast updates of
external caches.

> > However, there is a bug in the query planner that prevents this
> > from being very fast.  At some point I want to tackle that bug.
> 
> What bug is that?

I... asked for help on the IRC #postgresql channel.  I never posted here
about it.

Revisiting it now... the main problem was query _preparation time_, not
execution time.  So perhaps not so bad.  Still, it's worth looking into.

The query was something like this:

SELECT v.data->'verb_name' || '^' || (r.data->'named_acl_name') AS 
grant_name,
   grantee.right_entity_id AS grantee_id
FROM relationships grantee
JOIN relationships grant ON
 grantee.left_entity_id = grant.right_entity_id AND
 grantee.relationship_type_id IN (10421, 10431, 13591, 13921)
 AND grant.relationship_type_id = 10331
JOIN relationships perm_actions ON
 grantee.left_entity_id = perm_actions.right_entity_id AND
 perm_actions.relationship_type_id = 10381
JOIN relationships verb_in_vs ON
 verb_in_vs.right_entity_id = perm_actions.left_entity_id AND
 verb_in_vs.relationship_type_id = 10371
JOIN entities v ON v.id = verb_in_vs.left_entity_id
JOIN entities r ON r.id = grant.left_entity_id;

  (This query uses a bit of an EAV schema.  There's an "entities" table
  with an hstore column for storing attributes ("data") and another
  table, "relationships" that has (relationship_type_id, left_entity_id,
  right_entity_id) columns and which is indexed by both, left_entity_id
  and right_entity_id.  EAV schemas hide relevant information from the
  query planner, so there is that.)

The query plan for this is about as fast as one could hope.  After all,
it has to scan many of the rows.

Now suppose we were adding a new 'grantee' and wanted to generate the
additions that would result in the MV.  We could add this constraint to
the query:

WHERE grantee.left_entity_id = NEW.left_entity_id AND
grantee.right_entity_id = NEW.right_entity_id;

Now we've basically [almost] fully-specified the primary key for the
grantee table source.

The resulting query plan is actually pretty good.  It has the grantee
table source as the first table source in the inner-most loop.

If I re-write the query using WITH CTEs to get a similarly good plan,