Re: Implementing Incremental View Maintenance

2019-06-28 Thread Yugo Nagata
Hi Greg,

On Wed, 3 Apr 2019 17:41:36 -0400
Greg Stark  wrote:

> On Sun, 31 Mar 2019 at 23:22, Yugo Nagata  wrote:
> >
> > Firstly, this will handle simple definition views which includes only
> > selection, projection, and join.  Standard aggregations (count, sum, avg,
> > min, max) are not planned to be implemented in the first patch, but these
> > are commonly used in materialized views, so I'll implement them later on.
> 
> It's fine to not have all the features from day 1 of course. But I
> just picked up this comment and the followup talking about splitting
> AVG into SUM and COUNT and I had a comment. When you do look at
> tackling aggregates I don't think you should restrict yourself to
> these specific standard aggregations. We have all the necessary
> abstractions to handle all aggregations that are feasible, see
> https://www.postgresql.org/docs/devel/xaggr.html#XAGGR-MOVING-AGGREGATES
> 
> What you need to do -- I think -- is store the "moving aggregate
> state" before the final function. Then whenever a row is inserted or
> deleted or updated (or whenever another column is updated which causes
> the value to row to enter or leave the aggregation) apply either
> aggtransfn or aggminvtransfn to the state. I'm not sure if you want to
> apply the final function on every update or only lazily either may be
> better in some usage.

Thank you for your suggestion! I submitted the latest patch just now supporting 
some aggregate functions, but this supports only sum and count, and lacking a 
kind of generalization. However,  I would like to refine this to support more 
general aggregate functions. I think your suggestions is helpful for me to do 
this. Thank you!

Best regards,
Yugo Nagata


-- 
Yugo Nagata 




Re: Implementing Incremental View Maintenance

2019-06-28 Thread Yugo Nagata
Hi,

Attached is a WIP patch of IVM which supports some aggregate functions.

Currently, only count and sum are supported. Avg, min, or max is not supported 
although I think supporting this would not be so hard. 

As a restriction, expressions specified in GROUP BY must appear in the target 
list of views because tuples to be updated in MV are identified by using this 
group keys.


In the case of views without aggregate functions, only the number of tuple 
duplicates (__ivm_count__) are updated at incremental maintenance. On the other 
hand, in the case of vies with aggregations, the aggregated values are also 
updated. The way of update depends the kind of aggregate function.

In the case of sum (or agg functions except to count), NULL in input values is 
ignored, and this returns a null value when no rows are selected. To support 
this specification, the number of not-NULL input values is counted and stored 
in MV as a hidden column whose name is like __ivm_count_sum__, for example.

In the case of count, this returns zero when no rows are selected, and count(*) 
doesn't ignore NULL input. These specification are also supported.

Tuples to be updated in MV are identified by using keys specified by GROUP BY 
clause. However, in the case of aggregation without GROUP BY, there is only one 
tuple in the view, so keys are not uses to identify tuples.


In addition, a race condition which occurred in the previous version is 
prevented in this patch. In the previous version, when two translocations 
change a base tables concurrently, an anormal update of MV was possible because 
a change in one transaction was not visible for another transaction even in 
READ COMMITTED level. 

To prevent this, I fix this to take a lock in early stage of view maintenance 
to wait for concurrent transactions which are updating the same MV end. Also, 
we have to get the latest snapshot before computting delta tables because any 
changes which occurs in other transaction during lock waiting is not visible 
even in READ COMMITTED level.

In REPEATABLE READ or SERIALIZABLE level, don't wait a lock, and raise an error 
immediately to prevent anormal update. These solutions might be ugly, but 
something to prevent anormal update is anyway necessary. There may be better 
way.


Moreover, some regression test are added for aggregate functions support.
This is Hoshiai-san's work.

Although the code is not refined yet and I will need a deal of refactoring
and  reorganizing, I submitted this to share the current status.


* Exapmle (from regression test)

===
(1) creating tables

CREATE TABLE mv_base_a (i int, j int);
INSERT INTO mv_base_a VALUES
  (1,10),
  (2,20),
  (3,30),
  (4,40),
  (5,50);


(2) Views sith SUM() and COUNT() aggregation function

BEGIN;
CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_agg AS SELECT i, SUM(j), COUNT(i)  
FROM mv_base_a GROUP BY i;
SELECT * FROM mv_ivm_agg ORDER BY 1,2,3;
 i | sum | count
---+-+---
 1 |  10 | 1
 2 |  20 | 1
 3 |  30 | 1
 4 |  40 | 1
 5 |  50 | 1
(5 rows)

INSERT INTO mv_base_a VALUES(2,100);
SELECT * FROM mv_ivm_agg ORDER BY 1,2,3;
 i | sum | count
---+-+---
 1 |  10 | 1
 2 | 120 | 2
 3 |  30 | 1
 4 |  40 | 1
 5 |  50 | 1
(5 rows)

UPDATE mv_base_a SET j = 200 WHERE (i,j) = (2,100);
SELECT * FROM mv_ivm_agg ORDER BY 1,2,3;
 i | sum | count
---+-+---
 1 |  10 | 1
 2 | 220 | 2
 3 |  30 | 1
 4 |  40 | 1
 5 |  50 | 1
(5 rows)

DELETE FROM mv_base_a WHERE (i,j) = (2,200);
SELECT * FROM mv_ivm_agg ORDER BY 1,2,3;
 i | sum | count
---+-+---
 1 |  10 | 1
 2 |  20 | 1
 3 |  30 | 1
 4 |  40 | 1
 5 |  50 | 1
(5 rows)

ROLLBACK;


(3) Views with COUNT(*) aggregation function

BEGIN;
CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_agg AS SELECT i, SUM(j),COUNT(*)  
FROM mv_base_a GROUP BY i;
SELECT * FROM mv_ivm_agg ORDER BY 1,2,3;
 i | sum | count
---+-+---
 1 |  10 | 1
 2 |  20 | 1
 3 |  30 | 1
 4 |  40 | 1
 5 |  50 | 1
(5 rows)

INSERT INTO mv_base_a VALUES(2,100);
SELECT * FROM mv_ivm_agg ORDER BY 1,2,3;
 i | sum | count
---+-+---
 1 |  10 | 1
 2 | 120 | 2
 3 |  30 | 1
 4 |  40 | 1
 5 |  50 | 1
(5 rows)

ROLLBACK;

(4) Views with aggregation function without GROUP clause

BEGIN;
CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_group AS SELECT SUM(j)FROM 
mv_base_a;
SELECT * FROM mv_ivm_group ORDER BY 1;
 sum
-
 150
(1 row)

INSERT INTO mv_base_a VALUES(6,20);
SELECT * FROM mv_ivm_group ORDER BY 1;
 sum
-
 170
(1 row)
===



On Thu, 20 Jun 2019 16:44:10 +0900
Yugo Nagata  wrote:

> Hi hackers,
> 
> Thank you for your many questions and feedbacks at PGCon 2019.
> Attached is the patch rebased for the current master branch.
> 
> Regards,
> Yugo Nagata
> 
> On Tue, 14 May 2019 15:46:48 +0900
> Yugo 

Re: Implementing Incremental View Maintenance

2019-06-28 Thread Yugo Nagata
Hi Jim,

On Fri, 21 Jun 2019 08:41:11 -0700 (MST)
Jim Finnerty  wrote:

> Hi Yugo,
> 
> I'd like to compare the performance of your MV refresh algorithm versus
> an approach that logs changes into an mv log table, and can then apply the
> changes at some later point in time.  I'd like to handle the materialized
> join view (mjv) case first, specifically a 2-way left outer join, with a UDF
> in the SELECT list of the mjv.

Do you mean you have your implementation of IVM that using log tables?
I'm so interested in this, and  I would appreciate it if you explain the  
detail.
 
> Does your refresh algorithm handle mjv's with connected join graphs that
> consist entirely of inner and left outer joins?

> If so, I'd like to measure the overhead of your refresh algorithm on
> pgbench, modified to include an mjv, versus a (hand coded) incremental
> maintenance algorithm that uses mv log tables populated by ordinary
> triggers.  We may also want to look at capturing the deltas using logical
> replication, which ought to be faster than a trigger-based solution. 

In the current our implementation, outer joins is not yet supported though
we plan to handle this in future. So,we would not be able to compare these 
directly in the same workload in the current status.

However, the current our implementation supports only the way to update 
materialized views in a trigger, and the performance of modifying base tables 
will be lower than the approach  which uses log tables. This is because queries 
to update materialized views are issued in the trigger. This is not only a 
overhead itself, but also takes a lock on a materialized view, which has an ]
impact on concurrent execution performance. 

In the previous our PoC, we implemented IVM using log tables, in which logs are 
captured by triggers and materialized views are update incrementally by a user 
command[1]. However, to implement log table approach, we need a infrastructure 
to maintain these logs. For example, which logs are necessary and which logs 
can be discarded, etc. We thought this is not trivial work, so we decided to 
start from the current approach which doesn't use log tables. We are now 
preparing to implement this in the next step because this is also needed to 
support deferred maintenance of views.

[1] 
https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/

I agree that capturing the deltas using logical decoding will be faster than 
using a trigger although we haven't yet consider this well.

Best regadrds,
Yugo Nagata


-- 
Yugo Nagata 




Re: Implementing Incremental View Maintenance

2019-05-14 Thread Yugo Nagata
On Mon, 1 Apr 2019 12:11:22 +0900
Yugo Nagata  wrote:

> On Thu, 27 Dec 2018 21:57:26 +0900
> Yugo Nagata  wrote:
> 
> > Hi,
> > 
> > I would like to implement Incremental View Maintenance (IVM) on PostgreSQL. 
> >  
> 
> I am now working on an initial patch for implementing IVM on PostgreSQL.
> This enables materialized views to be updated incrementally after one
> of their base tables is modified.

Attached is a WIP patch of Incremental View Maintenance (IVM).
Major part is written by me, and changes in syntax and pg_class
are Hoshiai-san's work.

Although this is sill a draft patch in work-in-progress, any
suggestions or thoughts would be appreciated.
 
* What it is

This allows a kind of Immediate Maintenance of materialized views. if a 
materialized view is created by CRATE INCREMENTAL MATERIALIZED VIEW command,
the contents of the mateview is updated automatically and incrementally
after base tables are updated. Noted this syntax is just tentative, so it
may be changed.

== Example 1 ==
postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m AS SELECT * FROM t0;
SELECT 3
postgres=# SELECT * FROM m;
 i 
---
 3
 2
 1
(3 rows)

postgres=# INSERT INTO t0 VALUES (4);
INSERT 0 1
postgres=# SELECt * FROM m; -- automatically updated
 i 
---
 3
 2
 1
 4
(4 rows)
=

This implementation also supports matviews including duplicate tuples or
DISTINCT clause in its view definition query.  For example, even if a matview
is defined with DISTINCT to remove duplication of tuples in a base table, this
can perform incremental update of the matview properly. That is, the contents
of the matview doesn't change when exiting tuples are inserted into the base
tables, and a tuple in the matview is deleted only when duplicity of the
corresponding tuple in the base table becomes zero.

This is due to "colunting alogorithm" in which the number of each tuple is
stored in matviews as a special column value.

== Example 2 ==
postgres=# SELECT * FROM t1;
 id | t 
+---
  1 | A
  2 | B
  3 | C
  4 | A
(4 rows)

postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m1 AS SELECT t FROM t1;
SELECT 3
postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m2 AS SELECT DISTINCT t FROM t1;
SELECT 3
postgres=# SELECT * FROM m1; -- with duplicity
 t 
---
 A
 A
 C
 B
(4 rows)

postgres=# SELECT * FROM m2;
 t 
---
 A
 B
 C
(3 rows)

postgres=# INSERT INTO t1 VALUES (5, 'B');
INSERT 0 1
postgres=# DELETE FROM t1 WHERE id IN (1,3); -- delete (1,A),(3,C)
DELETE 2
postgres=# SELECT * FROM m1; -- one A left and one more B
 t 
---
 B
 B
 A
(3 rows)

postgres=# SELECT * FROM m2; -- only C is removed
 t 
---
 B
 A
(2 rows)
=

* How it works

1. Creating matview

When a matview is created, AFTER triggers are internally created
on its base tables. When the base tables is modified (INSERT, DELETE,
UPDATE), the matview is updated incrementally in the trigger function.

When populating the matview, GROUP BY and count(*) are added to the
view definition query before this is executed for counting duplicity
of tuples in the matview. The result of count is stored in the matview
as a special column named "__ivm_count__". 

2. Maintenance of matview

When base tables are modified, the change set of the table can be
referred as Ephemeral Named Relations (ENRs) thanks to Transition Table
(a feature of trigger implemented since PG10). We can calculate the diff
set of the matview by replacing the base table in the view definition
query with the ENR (at least if it is Selection-Projection -Join view). 
As well as view definition time, GROUP BY and count(*) is added in order
to count the duplicity of tuples in the diff set. As a result, two diff
sets (to be deleted from and to be inserted into the matview) are
calculated, and the results are stored into temporary tables respectively.

The matiview is updated by merging these change sets. Instead of executing
DELETE or INSERT simply, the values of __ivm_count__ column in the matview
is decreased or increased. When the values becomes zero, the corresponding
tuple is deleted from the matview.

3. Access to matview

When SELECT is issued for IVM matviews defined with DISTINCT, all columns
except to __ivm_count__ of each tuple in the matview are returned.  This is 
correct because duplicity of tuples are eliminated by GROUP BY.

When DISTINCT is not used, SELECT for the IVM matviews returns each tuple
__ivm_count__ times. Currently, this is implemented by rewriting the SELECT
query to replace the matview RTE with a subquery which joins the matview
and generate_series function as bellow. 

 SELECT mv.* FROM mv, generate_series(1, mv.__ivm_count__);

__ivm_count__ column is invisible for users when "SELECT * FROM ..." is
issued, but users can see the value by specifying in target list explicitly.

== Example 3 ==
postgres=# \d+ m1
  Materialized view "public.m1"
Column |  Type  | Collation | Nullable | Default | Storage  | 

Re: Implementing Incremental View Maintenance

2019-04-03 Thread Greg Stark
On Sun, 31 Mar 2019 at 23:22, Yugo Nagata  wrote:
>
> Firstly, this will handle simple definition views which includes only
> selection, projection, and join.  Standard aggregations (count, sum, avg,
> min, max) are not planned to be implemented in the first patch, but these
> are commonly used in materialized views, so I'll implement them later on.

It's fine to not have all the features from day 1 of course. But I
just picked up this comment and the followup talking about splitting
AVG into SUM and COUNT and I had a comment. When you do look at
tackling aggregates I don't think you should restrict yourself to
these specific standard aggregations. We have all the necessary
abstractions to handle all aggregations that are feasible, see
https://www.postgresql.org/docs/devel/xaggr.html#XAGGR-MOVING-AGGREGATES

What you need to do -- I think -- is store the "moving aggregate
state" before the final function. Then whenever a row is inserted or
deleted or updated (or whenever another column is updated which causes
the value to row to enter or leave the aggregation) apply either
aggtransfn or aggminvtransfn to the state. I'm not sure if you want to
apply the final function on every update or only lazily either may be
better in some usage.




Re: Implementing Incremental View Maintenance

2019-03-31 Thread Yugo Nagata
On Thu, 27 Dec 2018 21:57:26 +0900
Yugo Nagata  wrote:

> Hi,
> 
> I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.  

I am now working on an initial patch for implementing IVM on PostgreSQL.
This enables materialized views to be updated incrementally after one
of their base tables is modified.

At the first patch, I want to start from very simple features.

Firstly, this will handle simple definition views which includes only
selection, projection, and join.  Standard aggregations (count, sum, avg,
min, max) are not planned to be implemented in the first patch, but these
are commonly used in materialized views, so I'll implement them later on. 
Views which include sub-query, outer-join, CTE, and window functions are also
out of scope of the first patch. Also, views including self-join or views
including other views in their definition is not considered well, either. 
I need more investigation on these type of views although I found some papers
explaining how to handle sub-quries and outer-joins. 

Next, this will handle materialized views with no duplicates in their
tuples. I am thinking of implementing an algorithm to handle duplicates
called "counting-algorithm" afterward, but I'll start from this
no-duplicates assumption in the first patch for simplicity.

In the first patch, I will implement only "immediate maintenance", that is, 
materialized views are updated immediately in a transaction where a base
table is modified.  On other hand, in "deferred maintenance", materialized
views are updated after the transaction, for example, by the user command
like REFRESH. Although I plan to implement both eventually, I'll start from 
"immediate" because this seems to need smaller code than "deferred". For
implementing "deferred", it is need to implement a mechanism to maintain logs
for recording changes and an algorithm to compute the delta to be applied to
materialized views are necessary. 
 
I plan to implement the immediate maintenance using AFTER triggers created 
automatically on a materialized view's base tables.  In AFTER trigger using 
transition table features, changes occurs on base tables is recorded ephemeral 
relations. We can compute the delta to be applied to materialized views by
using these ephemeral relations and the view definition query, then update
the view by applying this delta.

-- 
Yugo Nagata 




Re: Implementing Incremental View Maintenance

2019-03-14 Thread Mitar
Hi!

On Thu, Jan 31, 2019 at 6:20 AM Yugo Nagata  wrote:
> BTW, what is uecase of reactive/live queries? (just curious)

It allows syncing the state between client and server. Client can then
have a subset of data and server can push changes as they are
happening to the client. Client can in a reactive manner render that
in the UI to the user. So you can easily create a reactive UI which
always shows up-to-date data without having to poll or something
similar.

How are things progressing? Any news on this topic?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m



Re: Implementing Incremental View Maintenance

2019-01-31 Thread Yugo Nagata
On Mon, 7 Jan 2019 00:39:00 -0800
Mitar  wrote:

> That sounds great! I am interested in this topic because I am
> interested in reactive/live queries and support for them in
> PostgreSQL. [1]
> 
> In that context, the problem is very similar: based on some state of
> query results and updated source tables, determine what should be new
> updates to send to the client describing changes to the query results.
> So after computing those incremental changes, instead of applying them
> to materialized view I would send them to the client. One could see
> materialized views only type of consumers of such information about
> incremental change.
> 
> So I would like to ask if whatever is done in this setting is done in
> a way that one could also outside of the context of materialized view.
> Not sure what would API be thought.

I didn't know about reactive/live queries but this seems share a part of
problem with IVM, so we might have common API.

BTW, what is uecase of reactive/live queries? (just curious)
 
> > For these reasons, we started to think to implement IVM without relying on 
> > OIDs
> > and made a bit more surveys.
> 
> I also do not see much difference between asking users to have primary
> key on base tables or asking them to have OIDs. Why do you think that
> a requirement for primary keys is a hard one? I think we should first
> focus on having IVM with base tables with primary keys. Maybe then
> later on we could improve on that and make it also work without.
>  
> To me personally, having unique index on source tables and also on
> materialized view is a reasonable restriction for this feature.
> Especially for initial versions of it.

Initially, I chose to use OIDs for theoretical reason, that is, to handle
"bag-semantics" which allows duplicate rows in tables. However, I agree
that we start from the restriction of having unique index on base tables.
 
> > If we can represent a change of UPDATE on a base table as query-like rather 
> > than
> > OLD and NEW, it may be possible to update the materialized view directly 
> > instead
> > of performing delete & insert.
> 
> Why do you need OLD and NEW? Don't you need just NEW and a list of
> columns which changed from those in NEW? I use such diffing query [4]
> to represent changes: first column has a flag telling if the row is
> representing insert, update, and remove, the second column tells which
> column are being changed in the case of the update, and then the NEW
> columns follow.

According the change propagation equation approach, OLD is necessary
to calculate tuples in MV to be deleted or modified. However, if tables
has unique keys, such tuples can be identifeid using the keys, so
OLD may not be needed, at least in eager approach.

In lazy approach, OLD contents of table is useful. For example, with
a join view MV = R * S, when dR is inserted into R and dS is inserted
into S, the delta to be inserted into MV will be
 
 dMV = (R_old * dS) + (dR * S_new)
= (R_old * dS) + (dR * S_old) + (dR * dS)

, hence the old contents of tables R and S are needed.
 
> I think that maybe standardizing structure for representing those
> changes would be a good step towards making this modular and reusable.
> Because then we can have three parts:
> 
> * Recording and storing changes in a standard format.
> * A function which given original data, stored changes, computes
> updates needed, also in some standard format.
> * A function which given original data and updates needed, applies them.

> I think if we split things into three parts as I described above, then
> this is just a question of configuration. Or you call all three inside
> one trigger to update in "eager" fashion. Or you store computed
> updates somewhere and then on demand apply those in "lazy" fashion.

I agree that defining the format to represent changes is important. However,
I am not sure both of eager and lazy can be handled in the same manner. I'll
consider about this more.

> > In the eager maintenance approache, we have to consider a race condition 
> > where two
> > different transactions change base tables simultaneously as discussed in 
> > [4].
> 
> But in the case of "lazy" maintenance there is a mirror problem: what
> if later changes to base tables invalidate some previous change to the
> materialized view. Imagine that one cell in a base table is first
> updated too "foo" and we compute an update for the materialized view
> to set it to "foo". And then the same cell is updated to "bar" and we
> compute an update for the materialized view again. If we have not
> applied any of those updates (because we are "lazy") now the
> previously computed update can be discarded. We could still apply
> both, but it would not be efficient.

In our PoC implementation, I handled this situation by removing
old contents from NEW delata table. In your example, when the base
table is updated from "foo" to "bar", the "foo" tuple is removed
from and the "bar" tuple is inserted in NEW delta 

Re: Implementing Incremental View Maintenance

2019-01-31 Thread Yugo Nagata
On Tue, 1 Jan 2019 14:46:25 +0700
Nguyễn Trần Quốc Vinh  wrote:

> We have some result on incremental update for MVs. We generate triggers on
> C to do the incremental maintenance. We posted the code to github about 1
> year ago, but unfortunately i posted a not-right ctrigger.h header. The
> mistake was exposed to me when a person could not compile the generated
> triggers and reported to me. And now i re-posted with the right ctrigger.h
> file.
> 
> You can find the codes of the generator here:
> https://github.com/ntqvinh/PgMvIncrementalUpdate/commits/master. You can
> find how did we do here:
> https://link.springer.com/article/10.1134/S0361768816050066. The paper is
> about generating of codes in pl/pgsql. Anyway i see it is useful for
> reading the codes. I don't know if i can share the paper or not so that i
> don't publish anywhere else. The text about how to generate triggers in C
> was published with open-access but unfortunately, it is in Vietnamese.
> 
> We are happy if the codes are useful for someone.

I have read your paper. It is interesting and great so that the algorithm
is described concretely.

After reading this, I have a few questions about your implementation.
Although I may be able to understand by reading your paper and code carefully,
I would appreciate it if you could answer these.

- It is said there are many limitations on the view definition query.
  How does this work when the query is not supported?

- Is it possible to support materialized views that have DISTINCT,
  OUTER JOIN, or sub-query in your approach?

- It is said that AVG is splitted to SUM and COUNT. Are these new additional
  columns in MV visible for users?

- Does this can handle the race condition discussed in [1], that is,
  if concurrent transactions update different two tables in the join
  view definition, is MV updated sucessfully?

[1] 
https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com

Regards,
-- 
Yugo Nagata 



Re: Implementing Incremental View Maintenance

2019-01-07 Thread Mitar
Hi!

On Thu, Dec 27, 2018 at 4:57 AM Yugo Nagata  wrote:
> I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
> IVM is a technique to maintain materialized views which computes and applies
> only the incremental changes to the materialized views rather than
> recomputate the contents as the current REFRESH command does.

That sounds great! I am interested in this topic because I am
interested in reactive/live queries and support for them in
PostgreSQL. [1]

In that context, the problem is very similar: based on some state of
query results and updated source tables, determine what should be new
updates to send to the client describing changes to the query results.
So after computing those incremental changes, instead of applying them
to materialized view I would send them to the client. One could see
materialized views only type of consumers of such information about
incremental change.

So I would like to ask if whatever is done in this setting is done in
a way that one could also outside of the context of materialized view.
Not sure what would API be thought.

>From the perspective of reactive/live queries, this package [2] is
interesting. To my understanding, it adds to all base tables two
columns, one for unique ID and one for revision of the row. And then
rewrites queries so that this information is passed all the way to
query results. In this way it can then determine mapping between
inputs and outputs. I am not sure if it then does incremental update
or just uses that to determine if view is invalidated. Not sure if
there is anything about such approach in literature. Or why both index
and revision columns are needed.

> For these reasons, we started to think to implement IVM without relying on 
> OIDs
> and made a bit more surveys.

I also do not see much difference between asking users to have primary
key on base tables or asking them to have OIDs. Why do you think that
a requirement for primary keys is a hard one? I think we should first
focus on having IVM with base tables with primary keys. Maybe then
later on we could improve on that and make it also work without.

To me personally, having unique index on source tables and also on
materialized view is a reasonable restriction for this feature.
Especially for initial versions of it.

> However, the discussion about IVM is now stoped, so we would like to restart 
> and
> progress this.

What would be next steps in your view to move this further?

> If we can represent a change of UPDATE on a base table as query-like rather 
> than
> OLD and NEW, it may be possible to update the materialized view directly 
> instead
> of performing delete & insert.

Why do you need OLD and NEW? Don't you need just NEW and a list of
columns which changed from those in NEW? I use such diffing query [4]
to represent changes: first column has a flag telling if the row is
representing insert, update, and remove, the second column tells which
column are being changed in the case of the update, and then the NEW
columns follow.

I think that maybe standardizing structure for representing those
changes would be a good step towards making this modular and reusable.
Because then we can have three parts:

* Recording and storing changes in a standard format.
* A function which given original data, stored changes, computes
updates needed, also in some standard format.
* A function which given original data and updates needed, applies them.

> In the previous discussion[4], it is planned to start from "eager" approach. 
> In our PoC
> implementaion, we used the other aproach, that is, using REFRESH command to 
> perform IVM.
> I am not sure which is better as a start point, but I begin to think that the 
> eager
> approach may be more simple since we don't have to maintain base table 
> changes in other
> past transactions.

I think if we split things into three parts as I described above, then
this is just a question of configuration. Or you call all three inside
one trigger to update in "eager" fashion. Or you store computed
updates somewhere and then on demand apply those in "lazy" fashion.

> In the eager maintenance approache, we have to consider a race condition 
> where two
> different transactions change base tables simultaneously as discussed in [4].

But in the case of "lazy" maintenance there is a mirror problem: what
if later changes to base tables invalidate some previous change to the
materialized view. Imagine that one cell in a base table is first
updated too "foo" and we compute an update for the materialized view
to set it to "foo". And then the same cell is updated to "bar" and we
compute an update for the materialized view again. If we have not
applied any of those updates (because we are "lazy") now the
previously computed update can be discarded. We could still apply
both, but it would not be efficient.

[1] 
https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com
[2] 

Re: Implementing Incremental View Maintenance

2019-01-06 Thread Nguyễn Trần Quốc Vinh
Dear All,

The tool analyzes the input query and then generates triggers (trigger
functions and pl/pgsql scripts as well) on all manipulating events
(insert/updates/delete) for all underlying base tables. The triggers do
incremental updates to the table that contains the query result (MV). You
can build the tool, then see the provided example and try the tool. It is
for synchronous maintenance.  It was hard tested but you can use it with
your own risk.

For Asynchronous maintenance, we generate 1) triggers on all manipulating
events on base tables to collect all the data changes and save to the
'special' tables; then 2) the tool to do incremental updates of MVs.

Best regards,

Vinh

TS. Nguyễn Trần Quốc Vinh
---
Chủ nhiệm khoa Tin học
Trường ĐH Sư phạm - ĐH Đà Nẵng

Nguyen Tran Quoc Vinh, PhD
Dean
Faculty of Information Technology
Danang University of Education
Website: http://it.ued.udn.vn; http://www.ued.vn ;
http://www.ued.udn.vn
SCV: http://scv.ued.vn/~ntquocvinh 
Phone: (+84) 511.6-512-586
Mobile: (+84) 914.78-08-98


On Mon, Jan 7, 2019 at 9:00 AM Tatsuo Ishii  wrote:

> > Hi all, just wanted to say  I am very happy to see progress made on this,
> > my codebase has multiple "materialized tables" which are maintained with
> > statement triggers (transition tables) and custom functions. They are
> ugly
> > and a pain to maintain, but they work because I have no other
> > solution...for now at least.
> >
> > I am concerned that the eager approach only addresses a subset of the MV
> use
> >> case space, though. For example, if we presume that an MV is present
> >> because
> >> the underlying direct query would be non-performant, then we have to at
> >> least question whether applying the delta-update would also be
> detrimental
> >> to some use cases.
> >>
> >
> > I will say that in my case, as long as my reads of the materialized view
> > are always consistent with the underlying data, that's what's
> important.  I
> > don't mind if it's eager, or lazy (as long as lazy still means it will
> > refresh prior to reading).
>
> Assuming that we want to implement IVM incrementally (that means, for
> example, we implement DELETE for IVM in PostgreSQL XX, then INSERT for
> IVM for PostgreSQL XX+1... etc.), I think it's hard to do it with an
> eager approach if want to MV is always consistent with base tables.
>
> On the other hand, a lazy approach allows to implement IVM
> incrementally because we could always let full MV build from scratch
> if operations on MV include queries we do not support.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
>


Re: Implementing Incremental View Maintenance

2019-01-06 Thread Tatsuo Ishii
> Hi all, just wanted to say  I am very happy to see progress made on this,
> my codebase has multiple "materialized tables" which are maintained with
> statement triggers (transition tables) and custom functions. They are ugly
> and a pain to maintain, but they work because I have no other
> solution...for now at least.
> 
> I am concerned that the eager approach only addresses a subset of the MV use
>> case space, though. For example, if we presume that an MV is present
>> because
>> the underlying direct query would be non-performant, then we have to at
>> least question whether applying the delta-update would also be detrimental
>> to some use cases.
>>
> 
> I will say that in my case, as long as my reads of the materialized view
> are always consistent with the underlying data, that's what's important.  I
> don't mind if it's eager, or lazy (as long as lazy still means it will
> refresh prior to reading).

Assuming that we want to implement IVM incrementally (that means, for
example, we implement DELETE for IVM in PostgreSQL XX, then INSERT for
IVM for PostgreSQL XX+1... etc.), I think it's hard to do it with an
eager approach if want to MV is always consistent with base tables.

On the other hand, a lazy approach allows to implement IVM
incrementally because we could always let full MV build from scratch
if operations on MV include queries we do not support.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: Implementing Incremental View Maintenance

2018-12-31 Thread Nguyễn Trần Quốc Vinh
Dear all,

We have some result on incremental update for MVs. We generate triggers on
C to do the incremental maintenance. We posted the code to github about 1
year ago, but unfortunately i posted a not-right ctrigger.h header. The
mistake was exposed to me when a person could not compile the generated
triggers and reported to me. And now i re-posted with the right ctrigger.h
file.

You can find the codes of the generator here:
https://github.com/ntqvinh/PgMvIncrementalUpdate/commits/master. You can
find how did we do here:
https://link.springer.com/article/10.1134/S0361768816050066. The paper is
about generating of codes in pl/pgsql. Anyway i see it is useful for
reading the codes. I don't know if i can share the paper or not so that i
don't publish anywhere else. The text about how to generate triggers in C
was published with open-access but unfortunately, it is in Vietnamese.

We are happy if the codes are useful for someone.

Thank you and best regards,

NTQ Vinh

TS. Nguyễn Trần Quốc Vinh
---
Chủ nhiệm khoa Tin học
Trường ĐH Sư phạm - ĐH Đà Nẵng

Nguyen Tran Quoc Vinh, PhD
Dean
Faculty of Information Technology
Danang University of Education
Website: http://it.ued.udn.vn; http://www.ued.vn ;
http://www.ued.udn.vn
SCV: http://scv.ued.vn/~ntquocvinh 
Phone: (+84) 511.6-512-586
Mobile: (+84) 914.78-08-98

On Mon, Dec 31, 2018 at 11:20 PM Adam Brusselback 
wrote:

> Hi all, just wanted to say  I am very happy to see progress made on this,
> my codebase has multiple "materialized tables" which are maintained with
> statement triggers (transition tables) and custom functions. They are ugly
> and a pain to maintain, but they work because I have no other
> solution...for now at least.
>
> I am concerned that the eager approach only addresses a subset of the MV
>> use
>> case space, though. For example, if we presume that an MV is present
>> because
>> the underlying direct query would be non-performant, then we have to at
>> least question whether applying the delta-update would also be detrimental
>> to some use cases.
>>
>
> I will say that in my case, as long as my reads of the materialized view
> are always consistent with the underlying data, that's what's important.  I
> don't mind if it's eager, or lazy (as long as lazy still means it will
> refresh prior to reading).
>


Re: Implementing Incremental View Maintenance

2018-12-31 Thread Adam Brusselback
Hi all, just wanted to say  I am very happy to see progress made on this,
my codebase has multiple "materialized tables" which are maintained with
statement triggers (transition tables) and custom functions. They are ugly
and a pain to maintain, but they work because I have no other
solution...for now at least.

I am concerned that the eager approach only addresses a subset of the MV use
> case space, though. For example, if we presume that an MV is present
> because
> the underlying direct query would be non-performant, then we have to at
> least question whether applying the delta-update would also be detrimental
> to some use cases.
>

I will say that in my case, as long as my reads of the materialized view
are always consistent with the underlying data, that's what's important.  I
don't mind if it's eager, or lazy (as long as lazy still means it will
refresh prior to reading).


Re: Implementing Incremental View Maintenance

2018-12-31 Thread denty
Hi Yugo.

> I would like to implement Incremental View Maintenance (IVM) on
> PostgreSQL.

Great. :-)

I think it would address an important gap in PostgreSQL’s feature set.

> 2. How to compute the delta to be applied to materialized views
> 
> Essentially, IVM is based on relational algebra. Theorically, changes on
> base
> tables are represented as deltas on this, like "R <- R + dR", and the
> delta on
> the materialized view is computed using base table deltas based on "change
> propagation equations".  For implementation, we have to derive the
> equation from
> the view definition query (Query tree, or Plan tree?) and describe this as
> SQL
> query to compulte delta to be applied to the materialized view.

We had a similar discussion in this thread
https://www.postgresql.org/message-id/flat/FC784A9F-F599-4DCC-A45D-DBF6FA582D30%40QQdd.eu,
and I’m very much in agreement that the "change propagation equations”
approach can solve for a very substantial subset of common MV use cases.

> There could be several operations for view definition: selection,
> projection, 
> join,  aggregation, union, difference, intersection, etc.  If we can
> prepare a
> module for each operation, it makes IVM extensable, so we can start a
> simple 
> view definition, and then support more complex views.

Such a decomposition also allows ’stacking’, allowing complex MV definitions
to be attacked even with only a small handful of modules.

I did a bit of an experiment to see if "change propagation equations” could
be computed directly from the MV’s pg_node_tree representation in the
catalog in PlPgSQL. I found that pg_node_trees are not particularly friendly
to manipulation in PlPgSQL. Even with a more friendly-to-PlPgSQL
representation (I played with JSONB), then the next problem is making sense
of the structures, and unfortunately amongst the many plan/path/tree utility
functions in the code base, I figured only a very few could be sensibly
exposed to PlPgSQL. Ultimately, although I’m still attracted to the idea,
and I think it could be made to work, native code is the way to go at least
for now.

> 4. When to maintain materialized views
> 
> [...]
> 
> In the previous discussion[4], it is planned to start from "eager"
> approach. In our PoC
> implementaion, we used the other aproach, that is, using REFRESH command
> to perform IVM.
> I am not sure which is better as a start point, but I begin to think that
> the eager
> approach may be more simple since we don't have to maintain base table
> changes in other
> past transactions.

Certainly the eager approach allows progress to be made with less
infrastructure.

I am concerned that the eager approach only addresses a subset of the MV use
case space, though. For example, if we presume that an MV is present because
the underlying direct query would be non-performant, then we have to at
least question whether applying the delta-update would also be detrimental
to some use cases.

In the eager maintenance approache, we have to consider a race condition
where two
different transactions change base tables simultaneously as discussed in
[4].

I wonder if that nudges towards a logged approach. If the race is due to
fact of JOIN-worthy tuples been made visible after a COMMIT, but not before,
then does it not follow that the eager approach has to fire some kind of
reconciliation work at COMMIT time? That seems to imply a persistent queue
of some kind, since we can’t assume transactions to be so small to be able
to hold the queue in memory.

Hmm. I hadn’t really thought about that particular corner case. I guess a
‘catch' could be simply be to detect such a concurrent update and demote the
refresh approach by marking the MV stale awaiting a full refresh.

denty.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



<    1   2   3