Re: [HACKERS] matview niceties: pick any two of these three

2013-05-06 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 The long and the short of it is this: having unlogged matviews in 9.3
 is not worth taking that risk for.  IMO anyway.

FWIW, +1

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] matview niceties: pick any two of these three

2013-05-04 Thread Tom Lane
... btw, I noticed a minor misfeature in the current implementation:

regression=# select pg_relation_size('int8_tbl');
 pg_relation_size 
--
 8192
(1 row)

regression=# create materialized view mv1 as select * from int8_tbl;
SELECT 5
regression=# select pg_relation_size('mv1');
 pg_relation_size 
--
16384
(1 row)

So when populating a matview, we fail to make any use at all of the
initially-added page.  On the other hand,

regression=# vacuum full mv1;
VACUUM
regression=# select pg_relation_size('mv1');
 pg_relation_size 
--
 8192
(1 row)

regression=# refresh materialized view mv1;
REFRESH MATERIALIZED VIEW
regression=# select pg_relation_size('mv1');
 pg_relation_size 
--
16384
(1 row)

I haven't looked into why the VACUUM FULL code path is able to make use
of the initially-created page while the CREATE/REFRESH code path can't.
Possibly it's due to some innocent-looking difference in order of
operations.  The details aren't really too relevant though.  Rather, my
point is that IMO this sort of bug is an inevitable consequence of the
layering violation that's at the heart of the current matview design.
If we stick with this design, I'm afraid we'll be squashing bugs of this
kind till kingdom come, and some of them may be much more painful to fix
than the ones we've found to date.  Layering violations tend to beget
more layering violations.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] matview niceties: pick any two of these three

2013-05-04 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kgri...@ymail.com writes:

 What do you see that I'm missing?

 TBH, if I had 20-20 foresight, we'd not be having this
 discussion: either I could see that you're right and this patch
 isn't going to cause us enormous pain, or I could put my finger
 on exactly where and why it's going to hurt us.  But I can't do
 the latter today.  Nonetheless, this patch terrifies me.  It's
 ugly, it's a serious layering violation, and it flies in the face
 of very-long-standing assumptions about the semantics of heap
 storage.  My gut tells me that we *will* regret shipping things
 this way.  Nor am I impressed with the amount of functionality
 we're gaining by taking such a risk.

OK, I think there are more votes for removing unlogged matviews for
9.3 than for any other option, and it's time to make a call; so I'm
conceding the point.  Do you want me to take a shot at undoing that
and straightening things out, or given the short time and your
superior grasp of the layer boundaries, would you prefer to take
it?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] matview niceties: pick any two of these three

2013-05-04 Thread Gavin Flower

On 05/05/13 10:41, Kevin Grittner wrote:
[...]

removing unlogged matviews for 9.3

[...]

So you are going to unlog the unlogged matviews from 9.3?  :-)

More seriously, I think this will result in a better solution for 9.4 - 
as it seems to me to be too rushed to fix it now



Cheers,
Gavin.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] matview niceties: pick any two of these three

2013-05-03 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 As I understand it, we don't currently have any mechanism in Postgres
 which would cause allocated-but-empty pages.

That's not correct: the situation can easily arise after a database
crash.  (The scenario is that we've done smgrextend to add the first
page to the file, but not yet completed or WAL-logged insertion of any
data into it.  This leaves us with an empty, all-zero page that will be
ignored until we next want to add some data to the table.)

The core problem here is that file extension is not a transactional
operation, because it doesn't roll back on crash.

The current matview design gets around this problem by requiring that
transition between scannable and unscannable states involve a complete
table rewrite, and thus the transactionality issue can be hidden behind
a transactional update of the matview's pg_class.relfilenode field.
IMO, that is obviously a dead-end design, because we are going to want
scannability status updates associated with partial updates of the
matview's contents.  So Kevin's summary is leaving out one key desirable
property:

(4) ability to change scannability state without a full table rewrite.

Putting the state into pg_class would preserve that property.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] matview niceties: pick any two of these three

2013-05-03 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:

 The current matview design gets around this problem by requiring
 that transition between scannable and unscannable states involve
 a complete table rewrite, and thus the transactionality issue can
 be hidden behind a transactional update of the matview's
 pg_class.relfilenode field.  IMO, that is obviously a dead-end
 design, because we are going to want scannability status updates
 associated with partial updates of the matview's contents.

I don't think the change between populated and non-populated needs
to change without a new heap being swapped in.  To do incremental
maintenance, one must be starting with a populated heap.  What will
change without a new heap is the contents of the matview and
whether a populated matview is fresh enough to be scanned for
some specific usage.  There's a lot of bikeshedding left to be done
on what options we need there and how they will be stored; but I
don't see how that threatens the ability to use a zero-length heap
to indicate that a matview has not been populated.  Scannability
and the populated state only happen to be the same thing for the
moment, but improving the scannability test does not require a new
mechanism for determining whether the matview has been populated.
A new populated mechanism is, we all agree, desirable; but it's
not necessary to improve scannability testing.  We'll get into all
kinds of design trouble if we conflate these two separate concepts.

 Kevin's summary is leaving out one key desirable property:

 (4) ability to change scannability state without a full table
 rewrite.

 Putting the state into pg_class would preserve that property.

I could probably add a hundred other desirable properties for a
materialized view implementation; I was trying to list the ones
which were still up-for-grabs for the initial 9.3 implementation. 
If we want to make progress on catching up to other major databases
on this, it will take progress in many releases; and every one of
those will lack something that someone wants.  OK, a lot of things
that a lot of people want.  But if we can't commit something that
is less than complete or less than perfect to make incremental
change, we're not going to be getting anywhere very fast.  I'm not
sure why you would want to make a non-populated matview scannable,
anyway; which seems like the only reason to have (4) without more
refined maintenance techniques.

I plan to submit a patch to the first 9.4 CF for differential
updates (generating the new contents in a temporary heap and
transactionally applying the changes to the existing matview heap)
as a REFRESH option.  I'm working on estimates for the amount of
work needed to support delta-driven incremental changes to matview
contents based on the declaration of the matview for various types
of matviews.  On a preliminary basis, I think that we can have fast
incremental updates in 9.4 without any imperative coding for all
non-recursive views, although including support in 9.4 for
incremental updates for views with window functions is still iffy,
and some desirable optimizations for aggregates might not make it.

I'm really very sure that incremental updates for recursive
matviews will not be a 9.4 feature, since that is best handled by
the Delete and Rederive (DRed) algorithm, while incremental
maintenance for non-recursive matviews is best handled with the
counting algorithm, and it seems unlikely that we'll get both
into one major release.

If I thought that the current hack for tracking the populated state
would be permanent, or that it would be so hard to replace in a
later release that it would impede further development, I would be
the first one to rip it out.  I haven't been able to see such a
risk, and nobody has adequately explained where they think the risk
lies.  The crash during extension of the first page argument
doesn't hold up unless you conflate the populated state with the
scannable state, and it seems likely that the scannable will not
only be separate, but able to differ for among backend processes.

If adding unlogged materialized views requires reworking support
for unlogged relations in general in a way that puts the data into
the catalog, and that becomes my issue to solve, I can deal with
that -- but that time would come out of time I would otherwise be
spending working on supporting incremental update of certain
classes of materialized views, and progress there will be slower
than I've been hoping.  It's all about trade-offs.  If the unlogged
recovery changes wind up on my plate, for example, it might mean
that the next release won't be able to have incremental updates for
matviews with a NOT EXISTS clause, since support for that looks to
be roughly the same amount of work.

I will certainly go along with the consensus, but from my
perspective, living with the current hack for determining whether a
matview has been populated is worth the benefit of having unlogged
matviews in 9.3.  I don't expect that 

Re: [HACKERS] matview niceties: pick any two of these three

2013-05-03 Thread Tom Lane
Kevin Grittner kgri...@ymail.com writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 The current matview design gets around this problem by requiring
 that transition between scannable and unscannable states involve
 a complete table rewrite, and thus the transactionality issue can
 be hidden behind a transactional update of the matview's
 pg_class.relfilenode field.  IMO, that is obviously a dead-end
 design, because we are going to want scannability status updates
 associated with partial updates of the matview's contents.

 I don't think the change between populated and non-populated needs
 to change without a new heap being swapped in.

I think you are using the distinction between populated and
scannable to dodge the issue.  My point is that whenever we add some
better control over whether matviews are considered too-stale-to-use,
it is going to be important that the staleness state can be changed, in
either direction, without a full rewrite of the matview's contents.
Whether you call that populated, or scannable, or stale, or warm
fuzzy doesn't matter to me.  If we can't do that then the feature is
not actually going to be of any great use.

And I want such updates to be transactional, because they'll very likely
be connected to transactional updates of the matview's contents.
So I continue to maintain that the current design is a dead end.

 We'll get into all
 kinds of design trouble if we conflate these two separate concepts.

Perhaps I still don't understand what useful distinction is there.
If there is a critical distinction, why is the current patch exposing
only one state value to users?

 I could probably add a hundred other desirable properties for a
 materialized view implementation; I was trying to list the ones
 which were still up-for-grabs for the initial 9.3 implementation. 
 If we want to make progress on catching up to other major databases
 on this, it will take progress in many releases; and every one of
 those will lack something that someone wants.  OK, a lot of things
 that a lot of people want.  But if we can't commit something that
 is less than complete or less than perfect to make incremental
 change, we're not going to be getting anywhere very fast.

Yes, exactly.  I am not sure why you are so desperate to have unlogged
matviews in 9.3 that you are willing to risk finding yourself boxed in
and unable to make future improvements that everybody agrees are useful.
ISTM that plain matviews are a perfectly good first-cut feature and we
can add unlogged ones whenever we can think of a better way to do it.

 If I thought that the current hack for tracking the populated state
 would be permanent, or that it would be so hard to replace in a
 later release that it would impede further development, I would be
 the first one to rip it out. I haven't been able to see such a
 risk, and nobody has adequately explained where they think the risk
 lies.

You can't update that state transactionally, and you can't update it
without a complete rewrite.  The only solutions you've offered to that
are devices Rube Goldberg would be proud of (not that the code wouldn't
fill his heart with joy as it is).  This is not a good place to be for
a first-cut implementation, at least not if we don't want to find
ourselves needing to make a representation change that pg_upgrade can't
readily cope with.

The long and the short of it is this: having unlogged matviews in 9.3
is not worth taking that risk for.  IMO anyway.

 I will certainly go along with the consensus, but from my
 perspective, living with the current hack for determining whether a
 matview has been populated is worth the benefit of having unlogged
 matviews in 9.3.  I don't expect that hack to be viable for more
 that one or two releases; I just think that it's a practical matter
 of working with the tools we have *now* for unlogged relations to
 try to create a new type of unlogged relation.

If you yourself admit that this hack has got a very short life
expectancy, why are you so willing to buy into the assumption that
you can get rid of it readily?  I do not grant that assumption,
and I'm not sure how come you think that hacking pg_upgrade to the point
of being able to get rid of it will be zero-cost, or even acceptable
to the community at all.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] matview niceties: pick any two of these three

2013-05-03 Thread Peter Eisentraut
On 5/2/13 6:00 PM, Kevin Grittner wrote:
 (1)  The ability to count on the results from a query which
 references a matview to reflect valid data from *some* point in
 time.
 
 (2)  The ability to create unlogged materialized views.
 
 (3)  The ability to consider a zero-length matview heap and a
 matview heap with allocated-but-empty pages as logically identical.

I have thought from the very beginning that violating (3) was totally
weird.  I would ditch (2) in favor of (3).



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] matview niceties: pick any two of these three

2013-05-03 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kgri...@ymail.com writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 The current matview design gets around this problem by
 requiring that transition between scannable and unscannable
 states involve a complete table rewrite, and thus the
 transactionality issue can be hidden behind a transactional
 update of the matview's pg_class.relfilenode field.  IMO, that
 is obviously a dead-end design, because we are going to want
 scannability status updates associated with partial updates of
 the matview's contents.

 I don't think the change between populated and non-populated
 needs to change without a new heap being swapped in.

 I think you are using the distinction between populated and
 scannable to dodge the issue.  My point is that whenever we add
 some better control over whether matviews are considered
 too-stale-to-use, it is going to be important that the staleness
 state can be changed, in either direction, without a full rewrite
 of the matview's contents.  Whether you call that populated, or
 scannable, or stale, or warm fuzzy doesn't matter to me. 
 If we can't do that then the feature is not actually going to be
 of any great use.

I suspect that we will never support scanning of a non-populated
matview, but that other considerations for what is scannable will
be added.  That's not a word game, it's an important semantic
difference.

 And I want such updates to be transactional, because they'll very
 likely be connected to transactional updates of the matview's
 contents.

Yep, that is definitely my intent.

 So I continue to maintain that the current design is a dead end.

Not if you don't muddle multiple concepts as one thing.

 We'll get into all kinds of design trouble if we conflate these
 two separate concepts.

 Perhaps I still don't understand what useful distinction is
 there.  If there is a critical distinction, why is the current
 patch exposing only one state value to users?

We don't have any other criteria for what is scannable implemented
yet in this release, but perhaps you're right that we should have a
pg_relation_is_populated() function available to user in addition
to the pg_relation_is_scannable() function.  For the moment, the
latter would just return the result of calling the former.  I
hadn't thought the former was useful at the SQL level at this
point, but thinking about it now, I wonder whether populated
wouldn't be the more appropriate test for pg_dump to use.  I'm not
sure just yet, but it bears some thinking.  More on that tomorrow
after I sleep on it.

 I could probably add a hundred other desirable properties for a
 materialized view implementation; I was trying to list the ones
 which were still up-for-grabs for the initial 9.3
 implementation.  If we want to make progress on catching up to
 other major databases on this, it will take progress in many
 releases; and every one of those will lack something that
 someone wants.  OK, a lot of things that a lot of people want. 
 But if we can't commit something that is less than complete or
 less than perfect to make incremental change, we're not going to
 be getting anywhere very fast.

 Yes, exactly.  I am not sure why you are so desperate to have
 unlogged matviews in 9.3 that you are willing to risk finding
 yourself boxed in and unable to make future improvements that
 everybody agrees are useful.  ISTM that plain matviews are a
 perfectly good first-cut feature and we can add unlogged ones
 whenever we can think of a better way to do it.

I think matviews are a useful feature without an unlogged option;
but I think they will be significantly more useful with the option.
 I agree it's a judgment call whether the hacks needed to support
it for now are worth the benefit of the option.  Based on my
reading of the user needs and the code, I think it is worth the
trade-off, but I can understand arguments to the contrary.  The
lack of specifics on where problems would be has left me
unconvinced.  The most specific arguments so far have been with the
extend from zero page length during incremental maintenance
(which I hope I have made clear why I don't see a problem there,
but maybe I need to say more...), and the argument that we might
want to stuff metadata at the front of the heap to track this,
which is out of the question for 9.3 and would require the same
response later whether we use the zero-length hack for now or not.

 If I thought that the current hack for tracking the populated
 state would be permanent, or that it would be so hard to replace
 in a later release that it would impede further development, I
 would be the first one to rip it out.  I haven't been able to
 see such a risk, and nobody has adequately explained where they
 think the risk lies.

 You can't update that state transactionally, and you can't update
 it without a complete rewrite.

Right, the *populated* state, indicating whether the query has been
run to initially fill the matview with data, 

Re: [HACKERS] matview niceties: pick any two of these three

2013-05-03 Thread Tom Lane
Kevin Grittner kgri...@ymail.com writes:
 What do you see that I'm missing?

TBH, if I had 20-20 foresight, we'd not be having this discussion:
either I could see that you're right and this patch isn't going to
cause us enormous pain, or I could put my finger on exactly where
and why it's going to hurt us.  But I can't do the latter today.
Nonetheless, this patch terrifies me.  It's ugly, it's a serious
layering violation, and it flies in the face of very-long-standing
assumptions about the semantics of heap storage.  My gut tells me
that we *will* regret shipping things this way.  Nor am I impressed
with the amount of functionality we're gaining by taking such a risk.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] matview niceties: pick any two of these three

2013-05-02 Thread Kevin Grittner
Tom has refactored where and how certain parts of the work get done
for materialized views, reducing issues with modularity violations.
I have been and will continue to review his changes to better
understand how the pieces of the code fit together, so hopefully he
won't need to do so much with future contributions.  He has not, so
far, changed functionality or the regression tests -- although he
has complained about things, of which the below is the all of what
remains as an issue for him as far as I know.  I may have missed
something; if so, it would be great to be reminded of what.

What remains is a choice between three alternatives on which no
consensus has been reached.  There are three things that, as fas as
I can tell, *everyone* agrees would be nice to be true about the
matview implementation for 9.3, but it does not seem feasible to
have more than two:

(1)  The ability to count on the results from a query which
references a matview to reflect valid data from *some* point in
time.

(2)  The ability to create unlogged materialized views.

(3)  The ability to consider a zero-length matview heap and a
matview heap with allocated-but-empty pages as logically identical.

Tom wants to ditch (2) to allow the others.  Robert wants to ditch
(1) to allow the others.  I want to ditch (3) to allow the others. 
Andres wants (3) and has not expressed an opinion on which he would
prefer to give up to get it.  I believe Josh Berkus has mentioned
how useful he thinks both (1) and (2) would be, without really
commenting on (3).

I am convinced that we can solve (3) in a later release without any
significant impact on those already using matviews, including
unlogged ones.  Tom and Robert have suggested that the current
implementation would paint us into a corner or pose a pg_upgrade
hazard, without any clear indication of the mechanism of the
problem.

The logjam has caused me to hold back on finishing up the direction
I prefer, for fear of conflicts with other work that someone else
may be trying to do.  Given that time is short, I'm going to apply
the patch which inserts five lines (two of them comment lines) on
the basis that at least the way that is currently implemented will
have no known bugs, and it's just not that much to rip back out if
we reach consensus on another direction.

How to we resolve this impasse?

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] matview niceties: pick any two of these three

2013-05-02 Thread Josh Berkus

 Tom wants to ditch (2) to allow the others.  Robert wants to ditch
 (1) to allow the others.  I want to ditch (3) to allow the others. 
 Andres wants (3) and has not expressed an opinion on which he would
 prefer to give up to get it.  I believe Josh Berkus has mentioned
 how useful he thinks both (1) and (2) would be, without really
 commenting on (3).

As I understand it, we don't currently have any mechanism in Postgres
which would cause allocated-but-empty pages.  That we *might* have such
a thing in 9.4 doesn't seem like a sufficient obstacle; we also might not.

Further, I don't think that pg_upgrade is really a red card here.
Matviews will be a new feature for 9.3.  If we end up having to say if
you use pg_upgrade to upgrade to 9.4, you will need to rebuild your
matviews afterwards, then that's what happens.  People are used to some
wonkiness in new features, and at this point the majority of our users
don't use pg_upgrade.

So, yes, I'd vote for (1) and (2) over (3), if that's the options which
make sense.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers