Re: Query Rewrite for Materialized Views (Postgres Extension)

2018-06-26 Thread Dent John
Hi Nico,

By the way, I do agree with your point about MERGE — if we can factor MV 
updates in that fashion, it will certainly save.

I didn’t reply immediately because your point caught me off guard:

> […]  If you look at my
> sketch for how to do it, you'll notice that many of the sorts of queries
> that one would choose to materialize... are not really amenable to this
> treatment […]

I’d rather presumed that there would be many examples of DML on base relations 
that could trigger a direct (incremental) update to the MV. I had presumed it, 
but not actually done any research.

So I did a bit of a trawl. There’s actually quite a lot of academic research 
out there, including [1] and [2]. [2] references a bunch of methods for 
incremental MV refresh, and ties them into a graph query context. I’m not sure 
if the graph query context itself is relevant for Postgres, but it’s certainly 
interesting and perhaps suggests that incremental refresh of at least some 
RECURSIVE MVs may not be entirely impossible. I also found [3], which is /very/ 
dated, but it strongly supports that MVs are a performant path to executing 
certain types of query.

So I definitely agree with you in the general case, but it seems there is scope 
to provide an incremental MV refresh capability that is broadly useful.

Almost certainly, any initial implementation would quickly fall back to a “full 
refresh”. But the refresh planner’s capability develops, I wonder if it could 
not embody an intelligent strategy that might even recognise common recursive 
patterns such as the transitive closure you mention, and refresh on an 
incremental basis — that would be really quite a cool capability to have.

denty.

[1] http://www.dbnet.ece.ntua.gr/pubs/uploads/TR-1998-14.ps 

[2] https://arxiv.org/pdf/1806.07344.pdf 
[3] https://www.cs.indiana.edu/pub/techreports/TR280.pdf 



Re: Query Rewrite for Materialized Views (Postgres Extension)

2018-06-19 Thread Nico Williams
On Tue, Jun 19, 2018 at 08:46:06AM +0100, Dent John wrote:
> I’m pretty impressed anything in this space can be written entirely in
> PlPGQSL!

https://github.com/twosigma/postgresql-contrib

PG is quite powerful!

I have even implemented a COMMIT TRIGGER in pure PlPgSQL.

You'll notice I make extensive use of record/table types.

> If you did integrate your implementation, it would be easy for my
> Extension to read from a table other than the one which it gets the MV
> definition from... Although having said that, if you went down the
> route you suggest, would not you make that “regular table” into a
> first class scheme object very much like Corey’s CONTINUOUS
> MATERIALIZED VIEW object concept?

I know nothing about the CONTINUOUS MATERIALIZED VIEW concept.  What
that would imply to me seems... difficult to achieve.  There will be
view queries that are difficult or impossible to automatically write
triggers for that update an MV synchronously.

> It is interesting that you can put triggers onto the table though, as
> that leads well in to use cases where it is desirable to “stack” MVs
> upon each other. (I’m not immediately sure whether such a use case is
> still needed in face of an always-up-to-date MV feature such as is
> described, but I’ve seen it elsewhere.)

I have done exactly this sort of MV chaining.

In my use case I had an MV of a nesting group transitive closure and
then another of a join between that and user group memberships to get a
complete user group transitive closure.

The transitive closure of nesting groups being computed via a RECURSIVE
CTE...  In principle one can understand such a query and automatically
write DMLs to update the MV on the fly (I've done this _manually_), but
the moment you do any bulk updates out of sequence you can't, and then
you have to refresh the view, so you see, I don't quite believe we can
have a true continuously materialized view :(

For me the key requirement is the ability to generate incremental
updates to an external system, but also the whole thing has to be fast.

> You say you’d like to base it off a VIEW’s AST (rather than, I
> presume, you must parse the reconstructed VIEW source text as SQL?),

PG already stores the AST.  There's no need to write a new parser when
PG already has one.  At the end of the day you need to analyze an AST
for the MV's source query in order to automatically write the triggers
to keep it updated (or mark it as needing a refresh).

> and I do agree — that’s probably the right direction... it does seem
> to me there is scope to leverage the “bottom half” of the ASSERTION
> stuff from Dave Fetter that Corey linked to — i.e., the part of it
> that manages triggers. Still leaves the AST crawling deciding what to
> actually do once a change is caught.

I'll search for this.

> Really good to hear about progress in this area.

Eh, I've not actually implemented any automatic generation of triggers
to update MVs.  I've written enough such triggers manually to believe
that *some* of them could be written by software.  If you look at my
sketch for how to do it, you'll notice that many of the sorts of queries
that one would choose to materialize... are not really amenable to this
treatment -- that's precisely because those make for the sorts of slow
queries that make you reach for materialization in the first place :(

But even so, automatically-generated triggers that mark an MV as needing
a refresh are always possible, and that is a huge improvement anyways,
especially if concurrent view refreshes can be made to go faster (by
having PKs on the MVs).  The idea is to have some sort of adaptive
automatic background, concurrent MV refresh running on a frequency based
in part of the amount of time it takes to refresh the VIEW.

BTW, MERGE would be a significant optimization for concurrent MV
refresh.  Think of MERGE as a statement that can scan a source, FULL
OUTER JOIN it to a target table, and for each row do an INSERT, UPDATE,
or DELETE -- this is 3x faster than the three INSERT/UPDATE/DELETE
statements you need to do the same work without a MERGE!

Nico
-- 



Re: Query Rewrite for Materialized Views (Postgres Extension)

2018-06-19 Thread Dent John
Hi Nico,

I’m pretty impressed anything in this space can be written entirely in PlPGQSL!

If you did integrate your implementation, it would be easy for my Extension to 
read from a table other than the one which it gets the MV definition from... 
Although having said that, if you went down the route you suggest, would not 
you make that “regular table” into a first class scheme object very much like 
Corey’s CONTINUOUS MATERIALIZED VIEW object concept?

It is interesting that you can put triggers onto the table though, as that 
leads well in to use cases where it is desirable to “stack” MVs upon each 
other. (I’m not immediately sure whether such a use case is still needed in 
face of an always-up-to-date MV feature such as is described, but I’ve seen it 
elsewhere.)

You say you’d like to base it off a VIEW’s AST (rather than, I presume, you 
must parse the reconstructed VIEW source text as SQL?), and I do agree — that’s 
probably the right direction... it does seem to me there is scope to leverage 
the “bottom half” of the ASSERTION stuff from Dave Fetter that Corey linked to 
— i.e., the part of it that manages triggers. Still leaves the AST crawling 
deciding what to actually do once a change is caught.

Really good to hear about progress in this area. 

d.




Re: Query Rewrite for Materialized Views (Postgres Extension)

2018-06-18 Thread Nico Williams
On Mon, Jun 18, 2018 at 07:38:13PM +0100, Dent John wrote:
> I commented to Corey (privately) that, while my rewrite extension has
> gotten me a server that responds quickly to aggregate queries, the
> constant need to refresh the supporting MVs means the system’s load
> average is constant and much higher than before. I’m happy with the
> tradeoff for now, but it’s a huge waste of energy, and I’m sure it
> must thrash my disk.
> 
> I’m very interested in what other people think of Corey’s idea.

I've written an alternative materialization extension (entirely as
PlPgSQL) based on PG's internals, but my version has a few big wins that
might help here.  I'm thinking of properly integrating it with PG.  Some
of the features include:

 - you can write triggers that update the materialization

   This is because the materialization is just a regular table in my
   implementation.

 - you can mark a view as needing a refresh (e.g., in a trigger)

 - you can declare a PK, other constraints, and indexes on a
   materialization

   The DMLs used to refresh a view concurrently can take advantage of
   the PK and/or other indexes to go fast.

 - you get a history table which records updates to the materialization

   This is useful for generating incremental updates to external
   systems.

Keeping track of refresh times should help decide whether to use or not
use a materialization in some query, or whether to refresh it first, or
not use it at all.

One of the things I'd eventually like to do is analyze the view query
AST to automatically generate triggers to update materializations or
mark them as needing refreshes.  A first, very very rough sketch of such
an analysis looks like this:

 - if the view query has CTEs
   -> create triggers on all its table sources to mark the
  materialization as needing a refresh

 - else if a table appears more than once as a table source in the view
   query
   -> create triggers on that table that mark the materialization as
  needing a refresh

 - else if a table appears anywhere other than the top-level
   -> create triggers .. mark as needing refresh

 - else if a table is a right-side of a left join
   -> create triggers .. mark as needing refresh

 - else if a table has no PK
   -> create triggers .. mark as needing refresh

 - else if the query has no GROUP BY, or only does a GROUP BY on this
   table and a list of columns prefixed by the table's PK
   -> rewrite the query to have WHERE eq conditions on values for the
  table's PK columns
  
  analyze this query

  if the result shows this table source as the first table in the
  plan
  -> create triggers on this table to update the materialization
 directly from querying the source view

 - else
   -> create triggers .. mark as needing refresh


Nico
-- 



Re: Query Rewrite for Materialized Views (Postgres Extension)

2018-06-18 Thread Dent John
I commented to Corey (privately) that, while my rewrite extension has gotten me 
a server that responds quickly to aggregate queries, the constant need to 
refresh the supporting MVs means the system’s load average is constant and much 
higher than before. I’m happy with the tradeoff for now, but it’s a huge waste 
of energy, and I’m sure it must thrash my disk.

I’m very interested in what other people think of Corey’s idea.


Re: Query Rewrite for Materialized Views (Postgres Extension)

2018-06-18 Thread Corey Huinker
>
> Hope it is useful or interesting for someone! Questions or comments are
>> very welcome.
>>
>
> good idea.
>
> Regards
>
> Pavel
>

In a recent PgConf NYC presentation [1] I was talking about the technical
hurdles to implementing materialized views that could be kept up to date at
all times, and the benefits of having such a thing.

Some use cases can be addressed with eventually-consistent derivative table
structures (Vertica's projections, PipelineDB's continuous views, etc), but
those methods rely on the source data never having deletes or updates, or
confining those updates to the "hot" part of the source tables, so it
generally works for time-series data, but not for other cases.

It has occurred to me that Dave Fetter's work on ASSERTIONS [2] has common
underpinnings with true continuous materialized views. In both cases, the
creation of a system object causes the creations of insert/update/delete
triggers on one or more existing tables. In the case of assertions, those
triggers are run with the goal of raising an error if rows are returned
from a query. In the case of a materialized view, those same triggers would
be used to delete rows from a CMV and insert replacements rows.

If we can get always-up-to-date materialized views, then Denty's work on
query rewrite would have greatly enhanced utility.

[1]
https://postgresconf.org/conferences/2018/program/proposals/a-roadmap-to-continuous-materialized-views-b4644661-8d5a-4186-8c17-4fb82600e147
[2]
http://databasedoings.blogspot.com/2018/06/ive-posted-my-slides-for-my-asssertions.html


Re: Query Rewrite for Materialized Views (Postgres Extension)

2018-06-17 Thread Pavel Stehule
2018-06-16 16:21 GMT+02:00 John Dent :

> Hi folks,
>
> I thought I’d share an update to my pet project, which dynamically
> rewrites queries to target materialized views when they are available and
> can satisfy a query (or part of it) with a lower cost plan.
>
> The extension is now a regular EXTENSION and no longer tied in to the FDW
> mechanism. As a result, it may now be more generally usable, and less
> complicated to integrate into an existing system. (The FDW approach was an
> easy way for me to get started, but it ultimately added complexity and was
> rather limiting.)
>
> Same caution as before applies:
>
> **NOTE: this is not "production ready" code — if it works for you, then
> great, but use it after thorough testing, and with appropriate caution.**
>
> Built, and has rudimentary testing against Postgres 10.1..10.3.
>
> Source code: https://github.com/d-e-n-t-y/pg_fdw_mv_rewrite
> README: https://github.com/d-e-n-t-y/pg_fdw_mv_rewrite/
> blob/master/README.md
>
> Hope it is useful or interesting for someone! Questions or comments are
> very welcome.
>

good idea.

Regards

Pavel



> denty.
>
> Begin original message:
>
> *From: *Dent John 
> *Subject: **Query Rewrite for Materialized Views (FDW Extension)*
> *Date: *5 April 2018 at 14:41:15 BST
> *To: *pgsql-hackers@lists.postgresql.org
>
> Hi,
>
> I wanted to share the project I've been working on which dynamically
> rewrites queries to target materialized views when views are available that
> can satisfy part of a query with lower cost plans.
>
> I embarked upon as an interesting side project. It took me a bit more time
> than I anticipated, but the result works for my use case. Because of that,
> I thought it worth sharing. However I would caution that my use case is not
> exactly of a commercial scale... so please heed the following obligatory
> warning:
>
> **NOTE: this is not "production ready" code — if it works for you, then
> great, but use it after thorough testing, and with appropriate caution.**
>
> There are some limitations to the rewrite opportunities it takes up, and
> it will almost certainly fail on complex materialized views composed of
> deeply nested queries.
>
> The extension does not have extensive (actually: any) documentation, but
> the few test cases should make obvious to the inclined reader how it works.
> This is deliberate at this early a stage: I don't want to encourage
> uninformed adoption because of the possibility of data loss or incorrect
> query rewrites.
>
> The extension is written against a Postgres 10.1 source tree.
>
> Source code: https://github.com/d-e-n-t-y/pg_fdw_mv_rewrite
>
> Questions or comments are very welcome.
>
> denty.
>
>
>