Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-21 Thread Robert Haas
2010/4/20 Pavel baro...@seznam.cz:
 For now I know it is not commitable in actual state, but for my thesis it is
 enough and I know it will not be commitable with this design at all. In case
 of GSoC it will depends on the time I will be able to spend on it, if I will
 consider some other design.

I am not sure about this, but I would think we would not want to
accept the project unless you intend to try to make it committable.  I
haven't looked at your actual code to see how much work I think that
would take.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-20 Thread Pavel

Greg Smith wrote:


pavelbaros wrote:
I am also waiting for approval for my repository named 
materialized_view on git.postgresql.org, so I could publish 
completed parts. 


Presuming that you're going to wander there and get assigned what 
looks like an official repo name for this project is a 
bit...optimistic.  I would recommend that you publish to something 
like github instead (you can fork http://github.com/postgres/postgres 
), and if the work looks good enough that it gets picked up by the 
community maybe you migrate it onto the main site eventually.  
git.postgresql.org is really not setup to be general hosting space for 
everyone who has a PostgreSQL related project; almost every repo on 
there belongs to someone who has already been a steady project 
contributor for a number of years.


Yes, you're true, I'm kind of newbe in this kind of project and 
specially in PostgreSQL. But I think it is best way to get into 
PostgreSQL. When I chose my bachelor thesis I did not know I could 
participate GSoC or try to make it commitable. Anyway I will make repo 
on github, so everybody could look at it, as soon as posible.


http://github.com/pbaros/postgres
 
(Switching to boilerplate mode for a paragraph...) You have picked a 
PostgreSQL feature that is dramatically more difficult than it appears 
to be, and I wouldn't expect you'll actually finish even a fraction of 
your goals in a summer of work.  You're at least in plentiful 
company--most students do the same.  As a rule, if you see a feature 
on our TODO list that looks really useful and fun to work on, it's 
only still there because people have tried multiple times to build it 
completely but not managed to do so because it's harder than it 
appears.  This is certainly the case with materialized views.


You've outlined a reasonable way to build a prototype that does a 
limited implementation here.  The issue is what it will take to extend 
that into being production quality for the real-world uses of 
materialized views.  How useful your prototype is depends on how well 
it implements a subset of that in a way that will get used by the 
final design.


The main hidden complexity in this particular project relates to 
handling view refreshes.  The non-obvious problem is that when the 
view updates, you need something like a SQL MERGE to really handle 
that in a robust way that doesn't conflict with concurrent access to 
queries against the materialized view.  And work on MERGE support is 
itself blocked behind the fact that PostgreSQL doesn't have a good way 
to lock access to a key value that doesn't exist yet--what other 
databases call key range locking.  See the notes for Add SQL-standard 
MERGE/REPLACE/UPSERT command at http://wiki.postgresql.org/wiki/Todo 
for more information.


You can work around that to build a prototype by grabbing a full table 
lock on the materialized view when updating it, but that's not a 
production quality solution.  Solving that little detail is actually 
more work than the entire project you've outlined.  Your suggested 
implementation--In function CloseIntoRel executor swap relfilenode's 
of temp table and original table and finally delete temp table--is 
where the full table lock is going to end up at.  The exact use cases 
that need materialized views cannot handle a CLUSTER-style table 
recreation each time that needs an exclusive lock to switchover, so 
that whole part of your design is going to be a prototype that doesn't 
work at all like what needs to get built to make this feature 
committable.  It's also not a reasonable assumption that you have 
enough disk space to hold a second copy of the MV in a production system.


For now I know it is not commitable in actual state, but for my thesis 
it is enough and I know it will not be commitable with this design at 
all. In case of GSoC it will depends on the time I will be able to spend 
on it, if I will consider some other design.




Once there's a good way to merge updates, how to efficiently generate 
them against the sort of large data sets that need materalized 
views--so you just write out the updates rather than a whole new 
copy--is itself a large project with a significant quantity of 
academic research to absorb before starting.  Dan Colish at Portland 
State has been playing around with prototypes for the specific problem 
of finding a good algorithm for view refreshing that is compatible 
with PostgreSQL's execution model.  He's already recognized the table 
lock issue here and for the moment is ignoring that part.  I don't 
have a good feel yet for how long the targeted update code will take 
to mature, but based on what I do know I suspect that little detail is 
also a larger effort than the entire scope you're envisioning.  
There's a reason why the MIT Press compendium Materialized Views: 
Techniques, Implementations, and Applications is over 600 pages 
long--I hope you've already started digging through that material.


I would like 

Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-13 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 There are basically 2 major parts for materialized views:

 A) Planner: Getting the query planner to swap in the MatView for part of
 a query automatically for query plan portions which the MatView supports;

 B) Maintenance: maintaining the MatView data according to the programmed
 scheme (synch, asynch, periodic).

 I do not believe it is possible to do both of the above in one summer.
 Of the two, (A) would be more useful since it is possible to manually
 implement (B) using triggers, queues and cron jobs today.

A patch that implements only (A) will be DOA.  The reason is that the
planner can *never* swap in a MatView on its own authority, unless it
can prove that this does not change the semantics of the query.  Which
it obviously will be unable to do unless there's a fully transparent
continuous-update scheme in place.

So the correct approach is to work on (B) first.  When and if we get to
a sufficiently transparent update implementation, we can think about
changing the planner.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Pavel Stehule
2010/4/12 Robert Haas robertmh...@gmail.com:
 On Sun, Apr 11, 2010 at 5:24 AM, Greg Smith g...@2ndquadrant.com wrote:
 From the rest of your comments, I'm comfortable that you're in sync with the
 not necessarily obvious risky spots here I wanted to raise awareness of.
  It's unreasonable to expect we'll have exactly the same priorities  here,
 and I doubt it's useful to debate how I perceive the merit of various
 development subsets here compared to yourself.  I don't think it's really
 important whether anyone agrees with me or not about exactly the value of a
 full table lock implementation.  The main thing I'm concerned about is just
 that it's noted as a known risky part, one that could end up blocking the
 project's ability to commit even a subset of the proposed patch here.

 I think that one of the things that we need to get our hands around is
 how we're going to distinguish the snapshot flavor of materialized
 view from the continuous update flavor.  By definition, the latter
 will only ever be supportable for a fairly restricted subset of all
 possible queries, and I am assuming that we will not want to decide
 what the behavior is going to be based on the query but rather based
 on what the user specifies.  Anything else seems like it would be have
 the potential for severe POLA violations.  So we need to think now
 about how we'll distinguish between the two flavors.  I imagine some
 sort of syntactic marker would be appropriate; not sure what.

 Reading this thread, I'm starting to grow concerned that some people
 may feel that manually refreshed materialized views are not even worth
 bothering with, because (the argument goes) you could just use some
 table and write a function that updates it.  There's probably some
 truth to that, but I guess my thought is that it would have some value
 as a convenience feature; and eventually we might optimize it to the
 point where it would make more sense to use the built-in feature
 rather than rolling your own.  However, if we're going to have
 complaints that manually refreshed materialized views suck and we
 should only ever support materialized views to the extent that we can
 make them automatically update on-the-fly, then let's have those
 complaints now before someone spends several months of their life on
 the project only to be told that we don't want it.  Let's be clear: I
 think it's useful, but, if other people disagree, we need to iron that
 out now.

 ...Robert

I thing so manually refreshed materialized views has sense. It is
similar to replication - there was replications like slony, but for
some people is more important integrated replication in 9.0. More -
manually refreshed (periodically refreshed) views can share lot if
infrastructure with dynamically actualised views. I am sure so
dynamical materialised views is bad task for GSoC - it is too large,
too complex. Manually refreshed views is adequate to two months work
and it has sense.

Regards
Pavel Stehule


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Robert Haas
On Mon, Apr 12, 2010 at 2:16 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 I am sure so
 dynamical materialised views is bad task for GSoC - it is too large,
 too complex. Manually refreshed views is adequate to two months work
 and it has sense.

That is my feeling also - though I fear that even the simplest
possible implementation of this feature may be a stretch.  Anyway we
agree: keep it simple.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Josh Berkus
On 4/9/10 1:36 PM, pavelbaros wrote:
 2) change rewriter
 - usually, view is relation with defined rule and when rewriting, rule
 is fired and relation (view) is replaced by definition of view. If
 relation do not have rule, planner and executor behave to it as physical
 table (relation). In case of materialized view we want to rewrite select
 statement only in case when we refreshing MV. In other cases rewriter
 should skip rewriting and pick up physical relation. Exclude situation
 when other rewrite rules which are not related to MV definition are
 specified.

This was done (although not completed) against PostgreSQL 7.1 by
students in Georgia, USA, I believe.  It might be worthwhile looking at
their work if I can find it (if nowhere else, it should be in the ACM).

There are basically 2 major parts for materialized views:

A) Planner: Getting the query planner to swap in the MatView for part of
a query automatically for query plan portions which the MatView supports;

B) Maintenance: maintaining the MatView data according to the programmed
scheme (synch, asynch, periodic).

I do not believe it is possible to do both of the above in one summer.
Of the two, (A) would be more useful since it is possible to manually
implement (B) using triggers, queues and cron jobs today.


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Robert Haas
On Mon, Apr 12, 2010 at 1:50 PM, Josh Berkus j...@agliodbs.com wrote:
 On 4/9/10 1:36 PM, pavelbaros wrote:
 2) change rewriter
 - usually, view is relation with defined rule and when rewriting, rule
 is fired and relation (view) is replaced by definition of view. If
 relation do not have rule, planner and executor behave to it as physical
 table (relation). In case of materialized view we want to rewrite select
 statement only in case when we refreshing MV. In other cases rewriter
 should skip rewriting and pick up physical relation. Exclude situation
 when other rewrite rules which are not related to MV definition are
 specified.

 This was done (although not completed) against PostgreSQL 7.1 by
 students in Georgia, USA, I believe.  It might be worthwhile looking at
 their work if I can find it (if nowhere else, it should be in the ACM).

 There are basically 2 major parts for materialized views:

 A) Planner: Getting the query planner to swap in the MatView for part of
 a query automatically for query plan portions which the MatView supports;

 B) Maintenance: maintaining the MatView data according to the programmed
 scheme (synch, asynch, periodic).

 I do not believe it is possible to do both of the above in one summer.
 Of the two, (A) would be more useful since it is possible to manually
 implement (B) using triggers, queues and cron jobs today.

I don't believe that it's possible to do EITHER of those things in one
summer.  I believe that a basic implementation that has NO bells and
whistles at all, as originally proposed, is going to be a Very Hard
Project.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Greg Smith

Josh Berkus wrote:

There are basically 2 major parts for materialized views:
A) Planner: Getting the query planner to swap in the MatView for part of
a query automatically for query plan portions which the MatView supports;
B) Maintenance: maintaining the MatView data according to the programmed
scheme (synch, asynch, periodic).
  


I'm run more into problems where it's perfectly fine to specify using 
the materialized view directly in the query, but keeping that view up to 
date usefully was the real problem.  The whole idea of getting a MV used 
automatically is valuable, but far down the roadmap as I see it.


Not everyone would agree of course, and your description does suggest a 
better way to organize a high-level summary though; here's a first cut:


1) Creation of materalized view
Current state:  using CREATE TABLE AS or similar mechanism, maintain 
manually
Optimal:  CREATE MATERIALIZED VIEW grammar, metadata to store MV data, 
dump/reload support


2) Updating materialized views
Current state:  periodically create new snapshots, or maintain using 
triggers
Optimal:  Built-in refresh via multiple strategies, with minimal locking 
as to improve concurrent access


3) Using materialized views in the planner
Current state:  specify the manually created MV in queries that can use it
Optimal:  Automatically accelerate queries that could be satisfied by 
substituting available MVs


With (1) being what I think is the only GSoC sized subset here.

I'm not saying someone can't jump right into (3), using the current 
implementations for (1) and (2) that are floating around out there.  I 
just think it would end up wasting a fair amount of work on prototypes 
that don't work quite the same way as the eventual fully integrated 
version.  You certainly can start working on (3) without a fully fleshed 
out implementation of (2), I don't know that it makes sense to work on 
before (1) though.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Robert Haas
On Mon, Apr 12, 2010 at 3:43 PM, Greg Smith g...@2ndquadrant.com wrote:
 Josh Berkus wrote:

 There are basically 2 major parts for materialized views:
 A) Planner: Getting the query planner to swap in the MatView for part of
 a query automatically for query plan portions which the MatView supports;
 B) Maintenance: maintaining the MatView data according to the programmed
 scheme (synch, asynch, periodic).


 I'm run more into problems where it's perfectly fine to specify using the
 materialized view directly in the query, but keeping that view up to date
 usefully was the real problem.  The whole idea of getting a MV used
 automatically is valuable, but far down the roadmap as I see it.

 Not everyone would agree of course, and your description does suggest a
 better way to organize a high-level summary though; here's a first cut:

 1) Creation of materalized view
 Current state:  using CREATE TABLE AS or similar mechanism, maintain
 manually
 Optimal:  CREATE MATERIALIZED VIEW grammar, metadata to store MV data,
 dump/reload support

 2) Updating materialized views
 Current state:  periodically create new snapshots, or maintain using
 triggers
 Optimal:  Built-in refresh via multiple strategies, with minimal locking as
 to improve concurrent access

 3) Using materialized views in the planner
 Current state:  specify the manually created MV in queries that can use it
 Optimal:  Automatically accelerate queries that could be satisfied by
 substituting available MVs

 With (1) being what I think is the only GSoC sized subset here.

 I'm not saying someone can't jump right into (3), using the current
 implementations for (1) and (2) that are floating around out there.  I just
 think it would end up wasting a fair amount of work on prototypes that don't
 work quite the same way as the eventual fully integrated version.  You
 certainly can start working on (3) without a fully fleshed out
 implementation of (2), I don't know that it makes sense to work on before
 (1) though.

Good summary.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Josh Berkus
Greg,

 I'm not saying someone can't jump right into (3), using the current
 implementations for (1) and (2) that are floating around out there.  I
 just think it would end up wasting a fair amount of work on prototypes
 that don't work quite the same way as the eventual fully integrated
 version.  You certainly can start working on (3) without a fully fleshed
 out implementation of (2), I don't know that it makes sense to work on
 before (1) though.

What would be the use case for (1) by itself?


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Greg Smith

Josh Berkus wrote:

What would be the use case for (1) by itself?
  


There isn't any use case for just working on the infrastructure, just 
like there's no use case for Syntax for partitioning on its own.  That 
why people rarely work on that part of these problems--it's boring and 
produces no feature of value on its own.  I believe that in both cases, 
attempts to build the more complicated parts, ones that don't first 
address some of the core infrastructure first, will continue to produce 
only prototypes.


I don't want to see Materialized Views wander down the same path as 
partitioning, where lots of people produce fun parts patches, while 
ignoring the grunt work of things like production quality catalog 
support for the feature.  I think Pavel's proposal got that part right 
by starting with the grammar and executor setup trivia.  And Robert's 
comments about the details in that area it's easy to forget about hit 
the mark too.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Josh Berkus

 I don't want to see Materialized Views wander down the same path as
 partitioning, where lots of people produce fun parts patches, while
 ignoring the grunt work of things like production quality catalog
 support for the feature.  I think Pavel's proposal got that part right
 by starting with the grammar and executor setup trivia.  And Robert's
 comments about the details in that area it's easy to forget about hit
 the mark too.

Good point.  And GSoC may be one of the few times we can get people to
do that kind of work.  Other than Simon, of course.  ;-)

I just worry about any feature which doesn't get as far as a
user-visible implementation.  If someone doesn't do the rest of the
parts soon, such features tend to atrophy because nobody is using them.


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Greg Smith

Josh Berkus wrote:

I just worry about any feature which doesn't get as far as a
user-visible implementation.  If someone doesn't do the rest of the
parts soon, such features tend to atrophy because nobody is using them.
  


While they're limited, there are complexly viable prototype quality 
implementations possible here without a large amount of work to get them 
started.  I'm not worried too much about this feature being unused.  As 
I was just reminded when assembling an page on the wiki about it:  
http://wiki.postgresql.org/wiki/Materalized_Views it's currently ranked 
#1--by a large margin--on the UserVoice feature request survey that 
Peter kicked off.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-11 Thread Greg Smith

Robert Haas wrote:

I also think that you're underestimating the number of problems that
will have to be solved to get this done.  It's going to take some
significant work - both design work and coding work - to figure out
how this should integrate into the rest of the system.  (What should
be the value of pg_class.relkind?  Where should the node
representation of the snapshot query be stored?  And did we handle all
of those OID dependencies correctly?)
  


I don't think I'm underestimating all that, but I suspect Pavel is by a 
considerable amount.  This is why I've been suggesting that a GSoC scope 
here might just be wrestling with this area of the problem for the whole 
summer--not even getting into updates beyond a completely trivial 
implementation, if any at all.  Things like handle OID dependencies 
are definitely not on the fun side of the development work that people 
tend to think about in advance.



Where I can see this possibly falling down (other than being just too
much work for a relative PostgreSQL novice to get it done in one
summer) is if there are concerns about it being incompatible with
incrementally-updated views.  I imagine that we're going to want to
eventually support both, so we need to make sure that this
implementation doesn't box us into a corner.


Exactly my concern; comitting this part without knowing how that's later 
going to fit into place strikes me the sort of the thing this project 
doesn't like to do.  The alternate approach of starting with the update 
machinery is less likely IMHO to get stuck wondering if there's a future 
blind spot coming or not, since you'd be building from the bottom up 
starting with the hardest parts.


From the rest of your comments, I'm comfortable that you're in sync 
with the not necessarily obvious risky spots here I wanted to raise 
awareness of.  It's unreasonable to expect we'll have exactly the same 
priorities  here, and I doubt it's useful to debate how I perceive the 
merit of various development subsets here compared to yourself.  I don't 
think it's really important whether anyone agrees with me or not about 
exactly the value of a full table lock implementation.  The main thing 
I'm concerned about is just that it's noted as a known risky part, one 
that could end up blocking the project's ability to commit even a subset 
of the proposed patch here.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-11 Thread Heikki Linnakangas
Robert Haas wrote:
 2010/4/10 Andrew Dunstan and...@dunslane.net:
 Heikki Linnakangas wrote:
 1. Keep the materialized view up-to-date when the base tables change.
 This can be further divided into many steps, you can begin by supporting
 automatic updates only on very simple views with e.g a single table and
 a where clause. Then extend that to support joins, aggregates,
 subqueries etc. Keeping it really limited, you could even require the
 user to write the required triggers himself.
 That last bit doesn't strike me as much of an advance. Isn't the whole point
 of this to automate it? Creating greedy materialized views is usually not
 terribly difficult now, but you do have to write the triggers.
 
 Yeah, I agree.

It doesn't accomplish anything interesting on its own. But if you do the
planner changes to automatically use the materialized view to satisfy
queries (item 2. in my previous email), it's useful.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-11 Thread Robert Haas
On Sun, Apr 11, 2010 at 10:26 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Robert Haas wrote:
 2010/4/10 Andrew Dunstan and...@dunslane.net:
 Heikki Linnakangas wrote:
 1. Keep the materialized view up-to-date when the base tables change.
 This can be further divided into many steps, you can begin by supporting
 automatic updates only on very simple views with e.g a single table and
 a where clause. Then extend that to support joins, aggregates,
 subqueries etc. Keeping it really limited, you could even require the
 user to write the required triggers himself.
 That last bit doesn't strike me as much of an advance. Isn't the whole point
 of this to automate it? Creating greedy materialized views is usually not
 terribly difficult now, but you do have to write the triggers.

 Yeah, I agree.

 It doesn't accomplish anything interesting on its own. But if you do the
 planner changes to automatically use the materialized view to satisfy
 queries (item 2. in my previous email), it's useful.

But you can't do that with a snapshot view, only a continuous updated one.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-11 Thread Florian G. Pflug

On 11.04.10 20:47 , Robert Haas wrote:

On Sun, Apr 11, 2010 at 10:26 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

Robert Haas wrote:

2010/4/10 Andrew Dunstanand...@dunslane.net:

Heikki Linnakangas wrote:

1. Keep the materialized view up-to-date when the base tables
change. This can be further divided into many steps, you can
begin by supporting automatic updates only on very simple
views with e.g a single table and a where clause. Then extend
that to support joins, aggregates, subqueries etc. Keeping it
really limited, you could even require the user to write the
required triggers himself.

That last bit doesn't strike me as much of an advance. Isn't
the whole point of this to automate it? Creating greedy
materialized views is usually not terribly difficult now, but
you do have to write the triggers.


Yeah, I agree.


It doesn't accomplish anything interesting on its own. But if you
do the planner changes to automatically use the materialized view
to satisfy queries (item 2. in my previous email), it's useful.


But you can't do that with a snapshot view, only a continuous updated
one.


If continuous updates prove to be too hard initially, you could instead
update the view on select if it's outdated. Such a materialized view
would be a kind of inter-session cache for subselects.

The hard part would probably be to figure out how to decide whether the
view is outdated or not, and to deal with two concurrent transactions
trying to use an outdates view (and both trying to refresh it). What
makes the second problem hard is that you wouldn't want one of the
transactions to wait for the other to complete, because this is not how
SELECTs traditionally behave.

best regards, Florian Pflug



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-11 Thread Robert Haas
On Sun, Apr 11, 2010 at 10:13 PM, Florian G. Pflug f...@phlo.org wrote:
 If continuous updates prove to be too hard initially, you could instead
 update the view on select if it's outdated. Such a materialized view
 would be a kind of inter-session cache for subselects.

 The hard part would probably be to figure out how to decide whether the
 view is outdated or not, and to deal with two concurrent transactions
 trying to use an outdates view (and both trying to refresh it). What
 makes the second problem hard is that you wouldn't want one of the
 transactions to wait for the other to complete, because this is not how
 SELECTs traditionally behave.

Well, the proposed project is to create views that only get refreshed manually.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-11 Thread Robert Haas
On Sun, Apr 11, 2010 at 5:24 AM, Greg Smith g...@2ndquadrant.com wrote:
 From the rest of your comments, I'm comfortable that you're in sync with the
 not necessarily obvious risky spots here I wanted to raise awareness of.
  It's unreasonable to expect we'll have exactly the same priorities  here,
 and I doubt it's useful to debate how I perceive the merit of various
 development subsets here compared to yourself.  I don't think it's really
 important whether anyone agrees with me or not about exactly the value of a
 full table lock implementation.  The main thing I'm concerned about is just
 that it's noted as a known risky part, one that could end up blocking the
 project's ability to commit even a subset of the proposed patch here.

I think that one of the things that we need to get our hands around is
how we're going to distinguish the snapshot flavor of materialized
view from the continuous update flavor.  By definition, the latter
will only ever be supportable for a fairly restricted subset of all
possible queries, and I am assuming that we will not want to decide
what the behavior is going to be based on the query but rather based
on what the user specifies.  Anything else seems like it would be have
the potential for severe POLA violations.  So we need to think now
about how we'll distinguish between the two flavors.  I imagine some
sort of syntactic marker would be appropriate; not sure what.

Reading this thread, I'm starting to grow concerned that some people
may feel that manually refreshed materialized views are not even worth
bothering with, because (the argument goes) you could just use some
table and write a function that updates it.  There's probably some
truth to that, but I guess my thought is that it would have some value
as a convenience feature; and eventually we might optimize it to the
point where it would make more sense to use the built-in feature
rather than rolling your own.  However, if we're going to have
complaints that manually refreshed materialized views suck and we
should only ever support materialized views to the extent that we can
make them automatically update on-the-fly, then let's have those
complaints now before someone spends several months of their life on
the project only to be told that we don't want it.  Let's be clear: I
think it's useful, but, if other people disagree, we need to iron that
out now.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-11 Thread Heikki Linnakangas
Robert Haas wrote:
 On Sun, Apr 11, 2010 at 10:26 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Robert Haas wrote:
 2010/4/10 Andrew Dunstan and...@dunslane.net:
 Heikki Linnakangas wrote:
 1. Keep the materialized view up-to-date when the base tables change.
 This can be further divided into many steps, you can begin by supporting
 automatic updates only on very simple views with e.g a single table and
 a where clause. Then extend that to support joins, aggregates,
 subqueries etc. Keeping it really limited, you could even require the
 user to write the required triggers himself.
 That last bit doesn't strike me as much of an advance. Isn't the whole 
 point
 of this to automate it? Creating greedy materialized views is usually not
 terribly difficult now, but you do have to write the triggers.
 Yeah, I agree.
 It doesn't accomplish anything interesting on its own. But if you do the
 planner changes to automatically use the materialized view to satisfy
 queries (item 2. in my previous email), it's useful.
 
 But you can't do that with a snapshot view, only a continuous updated one.

A materialized view with manually-defined triggers to keep it up-to-date
is a continuously updated one.

Other DBMSs allow that with snapshot views too, you just don't get
totally up-to-date results, but I not sure we want to go there.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-10 Thread Heikki Linnakangas
Greg Smith wrote:
 The main hidden complexity in this particular project relates to
 handling view refreshes.  The non-obvious problem is that when the view
 updates, you need something like a SQL MERGE to really handle that in a
 robust way that doesn't conflict with concurrent access to queries
 against the materialized view.  And work on MERGE support is itself
 blocked behind the fact that PostgreSQL doesn't have a good way to lock
 access to a key value that doesn't exist yet--what other databases call
 key range locking.  See the notes for Add SQL-standard
 MERGE/REPLACE/UPSERT command at http://wiki.postgresql.org/wiki/Todo
 for more information.
 
 You can work around that to build a prototype by grabbing a full table
 lock on the materialized view when updating it, but that's not a
 production quality solution.

It would still be useful for many applications. And it would provide a
basis to extend later. You don't need to solve all problems at once, as
long as what you implement is a useful subset.

 Now, with all that said, that doesn't mean there's not a useful project
 for you buried in this mess.  The first two steps in your plan:
 
 1) create materialized view
 2) change rewriter
 
 Include building a prototype grammer, doing an initial executor
 implementation, and getting some sort of rewriter working.  That is
 potentially good groundwork to lay here.  I would suggest that you
 completely drop your step 3:
 
 3) create command that takes snapshot (refresh MV)
 
 Because you cannot built that in a way that will be useful (and by that
 I mean committable quality) until there's a better way to handle updates
 than writing a whole new table and grabbing a full relation lock to
 switch to it.  To do a good job just on the first two steps should take
 at least a whole summer anyway--there's a whole stack of background
 research needed I haven't seen anyone do yet, and that isn't on your
 plan yet.  There is a precedent for taking this approach.  After getting
 stalled trying to add the entirety of easy partitioning to PostgreSQL,
 the current scope has been scaled back to just trying to get the syntax
 and on-disk structure right, then finish off the implementation.  See
 http://wiki.postgresql.org/wiki/Table_partitioning to get an idea how
 that's been broken into those two major chunks.

The good thing about this subject for GSoC is that it can be divided
into many small steps. There's two largely independent main parts:

1. Keep the materialized view up-to-date when the base tables change.
This can be further divided into many steps, you can begin by supporting
automatic updates only on very simple views with e.g a single table and
a where clause. Then extend that to support joins, aggregates,
subqueries etc. Keeping it really limited, you could even require the
user to write the required triggers himself.

2. Teach the planner to use materialized views automatically when a
query references the base tables. So if you issue the query SELECT *
FROM table WHERE foo  10 AND bar = 10, and there's a materialized view
on SELECT * FROM table WHERE bar = 10, the planner can transform the
original query into SELECT * FROM materializedview WHERE foo  10.
This largely depends on 1, although some DBMSs offer the option to use
manually refreshed materialized views too, knowing that they might not
be completely up-to-date.

There's a lot room to choose which problems you want to tackle, which is
good for a summer-of-code project. Your proposal basically describes
doing 1, in a limited fashion where the view is not updated
automatically, but only when the DBA runs a command to refresh it. I'm
not sure if that's useful enough on its own, writing CREATE
MATERIALIZED VIEW ... SELECT ... doesn't seem any easier than just
writing CREATE TABLA AS  But if you can do something about 2, or
even a very limited part of 1, keeping the view up-to-date
automatically, it becomes much more useful.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-10 Thread Andrew Dunstan



Heikki Linnakangas wrote:


1. Keep the materialized view up-to-date when the base tables change.
This can be further divided into many steps, you can begin by supporting
automatic updates only on very simple views with e.g a single table and
a where clause. Then extend that to support joins, aggregates,
subqueries etc. Keeping it really limited, you could even require the
user to write the required triggers himself.

  


That last bit doesn't strike me as much of an advance. Isn't the whole 
point of this to automate it? Creating greedy materialized views is 
usually not terribly difficult now, but you do have to write the triggers.


The other thing that could be interesting about this would be some 
scheme for lazy refresh that didn't involve re-extracting the whole data 
set.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-10 Thread Robert Haas
2010/4/10 Andrew Dunstan and...@dunslane.net:
 Heikki Linnakangas wrote:

 1. Keep the materialized view up-to-date when the base tables change.
 This can be further divided into many steps, you can begin by supporting
 automatic updates only on very simple views with e.g a single table and
 a where clause. Then extend that to support joins, aggregates,
 subqueries etc. Keeping it really limited, you could even require the
 user to write the required triggers himself.

 That last bit doesn't strike me as much of an advance. Isn't the whole point
 of this to automate it? Creating greedy materialized views is usually not
 terribly difficult now, but you do have to write the triggers.

Yeah, I agree.

 The other thing that could be interesting about this would be some scheme
 for lazy refresh that didn't involve re-extracting the whole data set.

One way to do this would be to infer a primary key for the result set
based on the input query.  But I think we don't really have the
infrastructure to do this right now, so not really a project for a
beginner.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-10 Thread Kevin Grittner
Greg Smith  wrote:
 
 And work on MERGE support is itself blocked behind the fact that
 PostgreSQL doesn't have a good way to lock access to a key value
 that doesn't exist yet--what other databases call key range
 locking.
 
The bulk of the serializable implementation WIP is work to implement
just this sort of locking.  There are already a couple possible
spin-off uses on the horizon based on the ability of these locks to
survive their initiating transactions and detect conflicting writes.
Both spinoffs involve somehow flagging a transaction as being one for
which the locks should be kept until further notice, and issuing a
notification when a conflicting write occurs.  That seems consistent
with the needs of materialized views, too.
 
It probably won't be solid in time to be useful for GSoC, but if
someone's looking to map out a plan for materialized views, I thought
this information might be germane.
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-10 Thread Greg Smith

Heikki Linnakangas wrote:

Your proposal basically describes
doing 1, in a limited fashion where the view is not updated
automatically, but only when the DBA runs a command to refresh it. I'm
not sure if that's useful enough on its own, writing CREATE
MATERIALIZED VIEW ... SELECT ... doesn't seem any easier than just
writing CREATE TABLA AS  But if you can do something about 2, or
even a very limited part of 1, keeping the view up-to-date
automatically, it becomes much more useful.
  


You've hit upon the core issue here.  You can build materialized views 
right now using CREATE TABLE AS.  You can even update them by creating 
a new table the same way, with a new name, and doing the 
LOCK/RENAME/DROP shuffle--what people used to do for rewriting bloated 
tables before there was CLUSTER.  The first step in the proposal here is 
essentially syntax to give an easier UI for that.  That's an interesting 
step, but recognize that it doesn't actually provide anything you can't 
do already.


If you then note that doing any sort of incremental update to the view 
is a hard problem, and that a lot of the useful cases for materialized 
views involve tables where it's impractical to recreate the whole thing 
anyway, you'll inevitably find yourself deeply lost in the minutia of 
how to handle the updates.  It's really the core problem in building 
what people expect from a materialized view implementation in a serious 
database.  Chipping away at the other pieces around it doesn't move the 
feature that far forward, even if you get every single one of them 
except incremental updates finished, because everything else combined is 
still not that much work in comparison to the issues around updates.


There certainly are a fair number of subproblems you can break out of 
here.  I just think it's important to recognize that the path that leads 
to a useful GSoC project and the one that gives a production quality 
materialized view implementation may not have that much in common, and 
to manage expectations on both sides accordingly.  If Pavel thinks he's 
going to end up being able to say I added materialized views to 
PostgreSQL at the end of the summer, that's going to end in 
disappointment.  And if people think this project plan will lead to 
being able to claim PostgreSQL now has this feature, that's also not 
going to go well.  If the scope is add initial grammar and rewriting 
moving toward a future materialized view feature, which the underlying 
implementation noted as a stub prototype, that might work out OK.  This 
is why I likened it to the work on Syntax for partitioning, which has 
a similarly focused subgoal structure.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-10 Thread Greg Smith

Robert Haas wrote:

It's not obvious to me
that a brief full-table lock wouldn't be acceptable for an initial
implementation.  Obviously it wouldn't be suitable for every use case
but since we're talking about manually refreshed views that was bound
to be true anyway.
  


There already is an initial implementation of sorts.  There are a couple 
of ways you can build these right now, so any new development has to 
look like it will end with good odds of being an improvement over what's 
already available before it's worth putting development resources into.


As a rough idea of what people want these for in the field, based on 
what I've seen requests for, imagine that someone has a 1TB table 
they're materializing a view on in order to get at least a 10:1, and 
hopefully close to a 100:1, speedup on viewing summary data.  Now, 
picture what happens if you have someone doing a sequential scan on the 
MV, which is still quite big, the updater process lines up to grab an 
exclusive lock when it's done, and now a second user wanting to read a 
single row quickly comes along behind it.  Given a full-table lock 
implementation, that scenario is unlikely to play out with the second 
user getting a fast response.  They'll likely sit in a lock queue for 
some potentially long period of time instead, waiting for the active seq 
scan to finish then the update to happen.  You have to build it that way 
or a steady stream of people reading could block out updates forever.


To be frank, that makes for a materalized view implementation of little 
value over what you can currently do as far as I'm concerned.  It might 
be interesting as a prototype, but that's not necessarily going to look 
like what's needed to do this for real at all.  I'm not a big fan of 
dumping work into projects when you can see exactly how it's going to 
fail before you even get started.  As I see if, if you know where it's 
going to fall down, you don't need to build a prototype as an exercise 
to show you how to build it--you should work on that part first instead.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-10 Thread Robert Haas
On Sat, Apr 10, 2010 at 11:40 PM, Greg Smith g...@2ndquadrant.com wrote:
 To be frank, that makes for a materalized view implementation of little
 value over what you can currently do as far as I'm concerned.  It might be
 interesting as a prototype, but that's not necessarily going to look like
 what's needed to do this for real at all.  I'm not a big fan of dumping work
 into projects when you can see exactly how it's going to fail before you
 even get started.  As I see if, if you know where it's going to fall down,
 you don't need to build a prototype as an exercise to show you how to build
 it--you should work on that part first instead.

Hopefully, you're already aware that I have enormous respect for your
opinions on a wide variety of topics; if not, let me publicly say that
I absolutely do.

Having said that, I disagree with your conclusions in this instance.
I see nothing but upside from this work.  It is vastly easier to write
a patch that builds on existing functionality than it is to write
something new from scratch.  If there's any value in having manually
refreshed materialized views, then having the simplest possible
implementation of what those can look like committed will make it far
easier to plan out next steps.  While the proposed implementation may
not solve a huge number of real-world problems, I think there's a good
argument that some people will get good use of it.  Not everyone has
1TB tables with continuous access patterns.  And, provided that it
doesn't conflict with anything we want to do in the future, being
useful to some people is a good enough reason to put it in.

I also think that you're underestimating the number of problems that
will have to be solved to get this done.  It's going to take some
significant work - both design work and coding work - to figure out
how this should integrate into the rest of the system.  (What should
be the value of pg_class.relkind?  Where should the node
representation of the snapshot query be stored?  And did we handle all
of those OID dependencies correctly?)

Where I can see this possibly falling down (other than being just too
much work for a relative PostgreSQL novice to get it done in one
summer) is if there are concerns about it being incompatible with
incrementally-updated views.  I imagine that we're going to want to
eventually support both, so we need to make sure that this
implementation doesn't box us into a corner.  But as far as snapshot
views go, complaining that the proposed locking is too strong doesn't
seem quite fair.  Fixing that, AFAICS, is a very hard project,
possibly involving significant planner support and an implementation
of MERGE, and I would much rather try to land a fundamentals patch
like this first and then deal with the gyrations that will be involved
in making this work than try to land the whole thing all at once.

Of course, if I'm missing something, and there's a SIMPLE way to get
materialized views that can be refreshed without a full-table lock,
that's another story altogether - maybe you have an idea?

Finally, even if we decided NOT to merge this patch because of the
limitations you mention (and right now that doesn't seem to be the
consensus), having this part of it completed as a starting point for
future work might be reason enough by itself.

In short: I think you may be letting the perfect be the enemy of the good.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-09 Thread Kevin Grittner
pavelbaros baro...@seznam.cz wrote:
 
 I am also waiting for approval for my repository named
 materialized_view on git.postgresql.org
 
They seem to prefer that you get a repository under your name and
use materialized_view as a branch name.  See my account on
git.postgresql.org and its serializable branch for an example.
 
I learned by putting in a request similar to your pending one.
;-)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-09 Thread Robert Haas
2010/4/9 pavelbaros baro...@seznam.cz:
 Implementation:  could be divided to few steps:

 1) create materialized view
 - modify grammar (parser): CREATE MATERIALIZED VIEW mv_name AS SELECT ...
 - change executor, so that it will create physical table defined by select
 statement

This basically needs to work the same was as CREATE TABLE ... AS
SELECT ... - save that it should also stuff the rewritten query
someplace, so that it can be re-executed.  I think one of the
important design questions here is figuring out exactly where that
someplace should be.

I also suspect that we want to block any write access to the relation
except for view refreshes.  IOW, INSERT, UPDATE, and DELETE on the
underlying relation should be rejected (though perhaps rewrite rules
redirecting such operations to other tables could be allowed).

 2) change rewriter
 - usually, view is relation with defined rule and when rewriting, rule is
 fired and relation (view) is replaced by definition of view. If relation do
 not have rule, planner and executor behave to it as physical table
 (relation). In case of materialized view we want to rewrite select statement
 only in case when we refreshing MV. In other cases rewriter should skip
 rewriting and pick up physical relation. Exclude situation when other
 rewrite rules which are not related to MV definition are specified.

 3) create command that takes snapshot (refresh MV)
 - modify grammar: ALTER MATERIALIZED VIEW mv_name REFRESH;
 - taking snapshot (refreshing) is similar to command SELECT INTO ... and I
 decided to follow the way it works. After parsing query and before
 transformation is MANUALLY created tree representation of SELECT * INTO
 ... with flag IntoClause-isrefresh set true, indicating it is refreshing
 materialized view. Everithing acts as it would be regular SELECT INTO ...
 except functions OpenIntoRel() and CloseIntoRel(). In function OpenIntoRel
 is created temp table (without catalog) and set as destination for result of
 select. In function CloseIntoRel executor swap relfilenode's of temp table
 and original table and finally delete temp table. Behavior of CloseIntoRel
 function is inspired by CLUSTER statement.

I'll have to read the code before I can comment on the rest of this in detail.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-09 Thread Greg Smith

pavelbaros wrote:
I am also waiting for approval for my repository named 
materialized_view on git.postgresql.org, so I could publish 
completed parts. 


Presuming that you're going to wander there and get assigned what looks 
like an official repo name for this project is a bit...optimistic.  I 
would recommend that you publish to something like github instead (you 
can fork http://github.com/postgres/postgres ), and if the work looks 
good enough that it gets picked up by the community maybe you migrate it 
onto the main site eventually.  git.postgresql.org is really not setup 
to be general hosting space for everyone who has a PostgreSQL related 
project; almost every repo on there belongs to someone who has already 
been a steady project contributor for a number of years.


(Switching to boilerplate mode for a paragraph...) You have picked a 
PostgreSQL feature that is dramatically more difficult than it appears 
to be, and I wouldn't expect you'll actually finish even a fraction of 
your goals in a summer of work.  You're at least in plentiful 
company--most students do the same.  As a rule, if you see a feature on 
our TODO list that looks really useful and fun to work on, it's only 
still there because people have tried multiple times to build it 
completely but not managed to do so because it's harder than it 
appears.  This is certainly the case with materialized views.


You've outlined a reasonable way to build a prototype that does a 
limited implementation here.  The issue is what it will take to extend 
that into being production quality for the real-world uses of 
materialized views.  How useful your prototype is depends on how well it 
implements a subset of that in a way that will get used by the final design.


The main hidden complexity in this particular project relates to 
handling view refreshes.  The non-obvious problem is that when the view 
updates, you need something like a SQL MERGE to really handle that in a 
robust way that doesn't conflict with concurrent access to queries 
against the materialized view.  And work on MERGE support is itself 
blocked behind the fact that PostgreSQL doesn't have a good way to lock 
access to a key value that doesn't exist yet--what other databases call 
key range locking.  See the notes for Add SQL-standard 
MERGE/REPLACE/UPSERT command at http://wiki.postgresql.org/wiki/Todo 
for more information.


You can work around that to build a prototype by grabbing a full table 
lock on the materialized view when updating it, but that's not a 
production quality solution.  Solving that little detail is actually 
more work than the entire project you've outlined.  Your suggested 
implementation--In function CloseIntoRel executor swap relfilenode's of 
temp table and original table and finally delete temp table--is where 
the full table lock is going to end up at.  The exact use cases that 
need materialized views cannot handle a CLUSTER-style table recreation 
each time that needs an exclusive lock to switchover, so that whole part 
of your design is going to be a prototype that doesn't work at all like 
what needs to get built to make this feature committable.  It's also not 
a reasonable assumption that you have enough disk space to hold a second 
copy of the MV in a production system.


Once there's a good way to merge updates, how to efficiently generate 
them against the sort of large data sets that need materalized views--so 
you just write out the updates rather than a whole new copy--is itself a 
large project with a significant quantity of academic research to absorb 
before starting.  Dan Colish at Portland State has been playing around 
with prototypes for the specific problem of finding a good algorithm for 
view refreshing that is compatible with PostgreSQL's execution model.  
He's already recognized the table lock issue here and for the moment is 
ignoring that part.  I don't have a good feel yet for how long the 
targeted update code will take to mature, but based on what I do know I 
suspect that little detail is also a larger effort than the entire scope 
you're envisioning.  There's a reason why the MIT Press compendium 
Materialized Views: Techniques, Implementations, and Applications is 
over 600 pages long--I hope you've already started digging through that 
material.


Now, with all that said, that doesn't mean there's not a useful project 
for you buried in this mess.  The first two steps in your plan:


1) create materialized view
2) change rewriter

Include building a prototype grammer, doing an initial executor 
implementation, and getting some sort of rewriter working.  That is 
potentially good groundwork to lay here.  I would suggest that you 
completely drop your step 3:


3) create command that takes snapshot (refresh MV)

Because you cannot built that in a way that will be useful (and by that 
I mean committable quality) until there's a better way to handle updates 
than writing a whole new table and grabbing a full 

Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-09 Thread Robert Haas
2010/4/9 Greg Smith g...@2ndquadrant.com:
 The main hidden complexity in this particular project relates to handling
 view refreshes.  The non-obvious problem is that when the view updates, you
 need something like a SQL MERGE to really handle that in a robust way that
 doesn't conflict with concurrent access to queries against the materialized
 view.  And work on MERGE support is itself blocked behind the fact that
 PostgreSQL doesn't have a good way to lock access to a key value that
 doesn't exist yet--what other databases call key range locking.  See the
 notes for Add SQL-standard MERGE/REPLACE/UPSERT command at
 http://wiki.postgresql.org/wiki/Todo for more information.

 You can work around that to build a prototype by grabbing a full table lock
 on the materialized view when updating it, but that's not a production
 quality solution.  Solving that little detail is actually more work than the

Hmm... I am not sure you're right about this.  It's not obvious to me
that a brief full-table lock wouldn't be acceptable for an initial
implementation.  Obviously it wouldn't be suitable for every use case
but since we're talking about manually refreshed views that was bound
to be true anyway.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers