subject:"Re\: \[HACKERS\] refresh materialized view concurrently"

Re: [HACKERS] refresh materialized view concurrently

2013-07-12 Thread Hitoshi Harada

On Tue, Jul 9, 2013 at 12:50 PM, Kevin Grittner kgri...@ymail.com wrote:

 Thanks again!  New patch attached.


After a couple of more attempts trying to break it, I mark this as
ready to go.  One small question:  why do we use multiple unique
indexes if exist?  One index isn't enough?

--
Hitoshi Harada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-07-12 Thread Kevin Grittner

Hitoshi Harada umi.tan...@gmail.com wrote:

 After a couple of more attempts trying to break it, I mark this
 as ready to go.

Thanks.

 One small question:  why do we use multiple unique indexes if
 exist?  

Two reasons.

(1)  By only matching up rows which test as equal on all columns
used in primary keys, we can use UPDATE for the matches rather than
DELETE and INSERT without fear of hitting a transient duplicate key
error on one of the indexes.  Since any update to an indexed column
prevents a HOT update, and results in the equivalent of a DELETE
and an INSERT anyway, there's no performance loss from letting them
not match if *any* column which is part of *any* unique index
doesn't match.

(2)  The planner can use one of the unique indexes to optimize the
diff phase.  How would we pick the one which is fastest?  By
comparing for equality on all the columns used in all unique
indexes, we let the planner decide.  I see no reason to try to
second-guess it; just let it do it's thing.

 One index isn't enough?

It would be enough for correctness, yes.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-07-09 Thread Hitoshi Harada

On Sat, Jul 6, 2013 at 9:20 AM, Kevin Grittner kgri...@ymail.com wrote:
 Hitoshi Harada umi.tan...@gmail.com wrote:

 Oops!

 Indeed.  Thanks for the careful testing.

 drop materialized view if exists mv;
 drop table if exists foo;
 create table foo(a, b) as values(1, 10);
 create materialized view mv as select * from foo;
 create unique index on mv(a);
 insert into foo select * from foo;
 refresh materialized view mv;
 refresh materialized view concurrently mv;

 test=# refresh materialized view mv;
 ERROR:  could not create unique index mv_a_idx
 DETAIL:  Key (a)=(1) is duplicated.
 test=# refresh materialized view concurrently mv;
 REFRESH MATERIALIZED VIEW

 Fixed by scanning the temp table for duplicates before generating
 the diff:

 test=# refresh materialized view concurrently mv;
 ERROR:  new data for mv contains duplicate rows without any NULL columns
 DETAIL:  Row: (1,10)


I think the point is not check the duplicate rows.  It is about unique
key constraint violation.  So, if you change the original table foo as
values(1, 10), (1, 20), the issue is still reproduced.  In
non-concurrent operation it is checked by reindex_index when swapping
the heap, but apparently we are not doing constraint check in
concurrent mode.

 [ matview with all columns covered by unique indexes fails ]

 Fixed.

 Other than these, I've found index is opened with NoLock, relying
 on ExclusiveLock of parent matview, and ALTER INDEX SET
 TABLESPACE or something similar can run concurrently, but it is
 presumably safe.  DROP INDEX, REINDEX would be blocked by the
 ExclusiveLock.

 Since others were also worried that an index definition could be
 modified while another process is holding an ExclusiveLock on its
 table, I changed this.


OK, others are resolved.  One thing I need to apology
make_temptable_name_n, because I pointed the previous coding would be
a potential memory overrun, but actually the relation name is defined
by make_new_heap, so unless the function generates stupid long name,
there is not possibility to make such big name that overruns
NAMEDATALEN.  So +1 for revert make_temptable_name_n, which is also
meaninglessly invented.

I've found another issue, which is regarding
matview_maintenance_depth.  If SPI calls failed (pg_cancel_backend is
quite possible) and errors out in the middle of processing, this value
stays above zero, so subsequently we can issue DML on the matview.  We
should make sure the value becomes zero before jumping out from this
function.


--
Hitoshi Harada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-07-03 Thread Kevin Grittner

Robert Haas robertmh...@gmail.com wrote:
 Hitoshi Harada umi.tan...@gmail.com wrote:
 Other than these, I've found index is opened with NoLock,
 relying on ExclusiveLock of parent matview, and ALTER INDEX SET
 TABLESPACE or something similar can run concurrently, but it is
 presumably safe.  DROP INDEX, REINDEX would be blocked by the
 ExclusiveLock.

 I doubt very much that this is safe.  And even if it is safe
 today, I think it's a bad idea, because we're likely to try to
 reduce lock levels in the future.  Taking no lock on a relation
 we're opening, even an index, seems certain to be a bad idea.

What we're talking about is taking a look at the index definition
while the indexed table involved is covered by an ExclusiveLock.
Why is that more dangerous than inserting entries into an index
without taking a lock on that index while the indexed table is
covered by a RowExclusiveLock, as happens on INSERT? 
RowExclusiveLock is a much weaker lock, and we can't add entries to
an index without looking at its definition.  Should we be taking
out locks on every index for every INSERT?  If not, what makes that
safe while merely looking at the definition is too risky?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-07-03 Thread Tom Lane

Kevin Grittner kgri...@ymail.com writes:
 Robert Haas robertmh...@gmail.com wrote:
 I doubt very much that this is safe.  And even if it is safe
 today, I think it's a bad idea, because we're likely to try to
 reduce lock levels in the future.  Taking no lock on a relation
 we're opening, even an index, seems certain to be a bad idea.

I'm with Robert on this.

 What we're talking about is taking a look at the index definition
 while the indexed table involved is covered by an ExclusiveLock.
 Why is that more dangerous than inserting entries into an index
 without taking a lock on that index while the indexed table is
 covered by a RowExclusiveLock, as happens on INSERT? 

I don't believe that that happens.  If it does, it's a bug.  Either the
planner or the executor should be taking a lock on each index touched
by a query.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-07-03 Thread Robert Haas

On Wed, Jul 3, 2013 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kgri...@ymail.com writes:
 Robert Haas robertmh...@gmail.com wrote:
 I doubt very much that this is safe.  And even if it is safe
 today, I think it's a bad idea, because we're likely to try to
 reduce lock levels in the future.  Taking no lock on a relation
 we're opening, even an index, seems certain to be a bad idea.

 I'm with Robert on this.

 What we're talking about is taking a look at the index definition
 while the indexed table involved is covered by an ExclusiveLock.
 Why is that more dangerous than inserting entries into an index
 without taking a lock on that index while the indexed table is
 covered by a RowExclusiveLock, as happens on INSERT?

 I don't believe that that happens.  If it does, it's a bug.  Either the
 planner or the executor should be taking a lock on each index touched
 by a query.

It seems Kevin's right.  Not sure why that doesn't break.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-07-03 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Wed, Jul 3, 2013 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I don't believe that that happens.  If it does, it's a bug.  Either the
 planner or the executor should be taking a lock on each index touched
 by a query.

 It seems Kevin's right.  Not sure why that doesn't break.

Are we somehow not going through ExecOpenIndices?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-07-03 Thread Robert Haas

On Wed, Jul 3, 2013 at 10:47 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Jul 3, 2013 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I don't believe that that happens.  If it does, it's a bug.  Either the
 planner or the executor should be taking a lock on each index touched
 by a query.

 It seems Kevin's right.  Not sure why that doesn't break.

 Are we somehow not going through ExecOpenIndices?

I dunno.  I just did a quick black-box test:

CREATE TABLE foo (a int primary key);
BEGIN;
INSERT INTO foo VALUES (1);
SELECT relation::regclass, locktype, mode, granted FROM pg_locks;

I get:

 relation |   locktype|   mode   | granted
--+---+--+-
 pg_locks | relation  | AccessShareLock  | t
 foo  | relation  | RowExclusiveLock | t
  | virtualxid| ExclusiveLock| t
  | transactionid | ExclusiveLock| t

No foo_pkey anywhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-07-03 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Wed, Jul 3, 2013 at 10:47 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Are we somehow not going through ExecOpenIndices?

 I dunno.  I just did a quick black-box test:

 CREATE TABLE foo (a int primary key);
 BEGIN;
 INSERT INTO foo VALUES (1);
 SELECT relation::regclass, locktype, mode, granted FROM pg_locks;

 I get:

  relation |   locktype|   mode   | granted
 --+---+--+-
  pg_locks | relation  | AccessShareLock  | t
  foo  | relation  | RowExclusiveLock | t
   | virtualxid| ExclusiveLock| t
   | transactionid | ExclusiveLock| t

 No foo_pkey anywhere.

That proves nothing, as we don't keep such locks after the query
(and there's no reason to AFAICS).  See ExecCloseIndices.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-07-03 Thread Andres Freund

On 2013-07-03 11:08:32 -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Wed, Jul 3, 2013 at 10:47 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Are we somehow not going through ExecOpenIndices?
 
  I dunno.  I just did a quick black-box test:
 
  CREATE TABLE foo (a int primary key);
  BEGIN;
  INSERT INTO foo VALUES (1);
  SELECT relation::regclass, locktype, mode, granted FROM pg_locks;
 
  I get:
 
   relation |   locktype|   mode   | granted
  --+---+--+-
   pg_locks | relation  | AccessShareLock  | t
   foo  | relation  | RowExclusiveLock | t
| virtualxid| ExclusiveLock| t
| transactionid | ExclusiveLock| t
 
  No foo_pkey anywhere.
 
 That proves nothing, as we don't keep such locks after the query
 (and there's no reason to AFAICS).  See ExecCloseIndices.

Should be easy enough to test by hacking LOCK TABLE to lock indexes and
taking out a conflicting lock (SHARE?) in a second transaction. I wonder
if we shouldn't just generally allow that, I remember relaxing that
check before (when playing with CREATE/DROP CONCURRENTLY afair).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-07-03 Thread Kevin Grittner

Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Are we somehow not going through ExecOpenIndices?

 I dunno.  I just did a quick black-box test:

 [ begin; insert; without commit ]

 No foo_pkey anywhere.

 That proves nothing, as we don't keep such locks after the query
 (and there's no reason to AFAICS).  See ExecCloseIndices.

OK.  I had seen that no locks were held after the insert and wasn't
aware that we acquired and then released them for each insert
within a transaction.  On the other hand, we acquire locks on all
indexes even for a HOT UPDATE which uses a seqscan, and hold those
until end of transaction.  Is there a reason for that?

I suppose that since a concurrent refresh is very likely to lock
all these indexes with RowExclusiveLock anyway, as a result of the
DELETE, UPDATE and INSERT statements subsequently issued through
SPI, I might was well acquire that lock when I look at the
definition, and not release it -- so that the subsequent locks are
local to the backend, and therefore faster.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-07-03 Thread Tom Lane

Kevin Grittner kgri...@ymail.com writes:
 OK.  I had seen that no locks were held after the insert and wasn't
 aware that we acquired and then released them for each insert
 within a transaction.  On the other hand, we acquire locks on all
 indexes even for a HOT UPDATE which uses a seqscan, and hold those
 until end of transaction.  Is there a reason for that?

Sounds dubious to me; although in the HOT code it might be that there's
no convenient place to release the index locks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-07-03 Thread Kevin Grittner

Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kgri...@ymail.com writes:

 we acquire locks on all indexes even for a HOT UPDATE which uses
 a seqscan, and hold those until end of transaction.  Is there a
 reason for that?

 Sounds dubious to me; although in the HOT code it might be that
 there's no convenient place to release the index locks.

Further testing shows that any UPDATE or DELETE statement acquires
a RowExclusiveLock on every index on the table and holds it until
end of transaction, whether or not any rows are affected and
regardless of whether an index scan or a seqscan is used.  In fact,
just an EXPLAIN of an UPDATE or DELETE does so.  It is only INSERT
which releases the locks at the end of the statement.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-07-03 Thread Tom Lane

Kevin Grittner kgri...@ymail.com writes:
 Further testing shows that any UPDATE or DELETE statement acquires
 a RowExclusiveLock on every index on the table and holds it until
 end of transaction, whether or not any rows are affected and
 regardless of whether an index scan or a seqscan is used.  In fact,
 just an EXPLAIN of an UPDATE or DELETE does so.  It is only INSERT
 which releases the locks at the end of the statement.

Hm, possibly the planner is taking those locks.  I don't think the
executor's behavior is any different.  But the planner only cares about
indexes in SELECT/UPDATE/DELETE, since an INSERT has no interest in
scanning the target table.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-07-02 Thread Hitoshi Harada

On Thu, Jun 27, 2013 at 12:19 AM, Hitoshi Harada umi.tan...@gmail.comwrote:




 On Wed, Jun 26, 2013 at 1:38 AM, Kevin Grittner kgri...@ymail.com wrote:



New version attached.


 Will take another look.



Oops!

drop materialized view if exists mv;
drop table if exists foo;
create table foo(a, b) as values(1, 10);
create materialized view mv as select * from foo;
create unique index on mv(a);
insert into foo select * from foo;
refresh materialized view mv;
refresh materialized view concurrently mv;

test=# refresh materialized view mv;
ERROR:  could not create unique index mv_a_idx
DETAIL:  Key (a)=(1) is duplicated.
test=# refresh materialized view concurrently mv;
REFRESH MATERIALIZED VIEW


Here's one more.

create table foo(a, b, c) as values(1, 2, 3);
create materialized view mv as select * from foo;
create unique index on mv (a);
create unique index on mv (b);
create unique index on mv (c);
insert into foo values(2, 3, 4);
insert into foo values(3, 4, 5);
refresh materialized view concurrently mv;

test=# refresh materialized view concurrently mv;
ERROR:  syntax error at or near FROM
LINE 1: UPDATE public.mv x SET  FROM pg_temp_2.pg_temp_16615_2 d WHE...
^
QUERY:  UPDATE public.mv x SET  FROM pg_temp_2.pg_temp_16615_2 d WHERE
d.tid IS NOT NULL AND x.ctid = d.tid


Other than these, I've found index is opened with NoLock, relying on
ExclusiveLock of parent matview, and ALTER INDEX SET TABLESPACE or
something similar can run concurrently, but it is presumably safe.  DROP
INDEX, REINDEX would be blocked by the ExclusiveLock.

-- 
Hitoshi Harada

Re: [HACKERS] refresh materialized view concurrently

2013-07-02 Thread Robert Haas

On Tue, Jul 2, 2013 at 4:02 AM, Hitoshi Harada umi.tan...@gmail.com wrote:
 Other than these, I've found index is opened with NoLock, relying on
 ExclusiveLock of parent matview, and ALTER INDEX SET TABLESPACE or something
 similar can run concurrently, but it is presumably safe.  DROP INDEX,
 REINDEX would be blocked by the ExclusiveLock.

I doubt very much that this is safe.  And even if it is safe today, I
think it's a bad idea, because we're likely to try to reduce lock
levels in the future.  Taking no lock on a relation we're opening,
even an index, seems certain to be a bad idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-06-27 Thread Hitoshi Harada

On Tue, Jun 25, 2013 at 9:07 AM, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Jun 21, 2013 at 5:20 AM, Hitoshi Harada umi.tan...@gmail.com
 wrote:
  If I don't miss something, the requirement for the CONCURRENTLY option
 is to
  allow simple SELECT reader to read the matview concurrently while the
 view
  is populating the new data, and INSERT/UPDATE/DELETE and SELECT FOR
  UPDATE/SHARE are still blocked.  So, I wonder why it is not possible
 just to
  acquire ExclusiveLock on the matview while populating the data and swap
 the
  relfile by taking small AccessExclusiveLock.  This lock escalation is no
  dead lock hazard, I suppose, because concurrent operation would block the
  other at the point ExclusiveLock is acquired, and ExclusiveLock conflicts
  AccessExclusiveLock.  Then you don't need the complicated SPI logic or
  unique key index dependency.

 This is no good.  One, all lock upgrades are deadlock hazards.  In
 this case, that plays out as follows: suppose that the session running
 REFRESH MATERIALIZED VIEW CONCURRENTLY also holds a lock on something
 else.  Some other process takes an AccessShareLock on the materialized
 view and then tries to take a conflicting lock on the other object.
 Kaboom, deadlock.  Granted, the chances of that happening in practice
 are small, but it IS the reason why we typically try to having
 long-running operations perform lock upgrades.  Users get really
 annoyed when their DDL runs for an hour and then rolls back.


OK, that' not safe.  What I was thinking was something similar to
compare-and-swap, where the whole operation is atomic under an
AccessExclusiveLock.  What if we release ExclusiveLock once a new matview
was created and re-acquire AccessExclusiveLock before trying swap?  Note
matview is a little different from index which I know people are talking
about in REINDEX CONCURRENTLY thread, in that the content of matview does
not change incrementally (at least at this point), but only does change
fully in swapping operation by the same REFRESH MATERIALIZED VIEW command.
The only race condition is between releasing Exclusive lock and re-acquire
AccessExclusiveLock someone else can go ahead with the same operation and
could create another one.  If it happens, let's abort us, because I guess
that's the way our transaction system is working anyway;  in case of unique
key index insertion for example, if I find another guy is inserting the
same value in the index, I wait for the other guy to finish his work and if
his transaction commits I give up, otherwise I go ahead.  Maybe it's
annoying if an hour operation finally gets aborted, but my purpose is
actually achieved by the other guy.  If the primary goal of this feature is
let reader reads the matview concurrently it should be ok?

Hmm, but in such cases the first guy is always win and the second guy who
may come an hour later loses so we cannot get the result from the latest
command... I still wonder there should be some way.


 Two, until we get MVCC catalog scans, it's not safe to update any
 system catalog tuple without an AccessExclusiveLock on some locktag
 that will prevent concurrent catalog scans for that tuple.  Under
 SnapshotNow semantics, concurrent readers can fail to see that the
 object is present at all, leading to mysterious failures - especially
 if some of the object's catalog scans are seen and others are missed.


 So what I'm saying above is take AccessExclusiveLock on swapping relfile
in catalog.  This doesn't violate your statement, I suppose.  I'm actually
still skeptical about MVCC catalog, because even if you can make catalog
lookup MVCC, relfile on the filesystem is not MVCC.  If session 1 changes
relfilenode in pg_class and commit transaction, delete the old relfile from
the filesystem, but another concurrent session 2 that just took a snapshot
before 1 made such change keeps running and tries to open this relation,
grabbing the old relfile and open it from filesystem -- ERROR: relfile not
found.  So everyone actually needs to see up-to-date information that
synchronizes with what filesystem says and that's SnapshotNow.   In my
experimental thought above about compare-and-swap way, in compare phase he
needs to see the most recent valid information, otherwise he never thinks
someone did something new.  Since I haven't read the whole thread, maybe we
have already discussed about it, but it would help if you clarify this
concern.


Thanks,
-- 
Hitoshi Harada

Re: [HACKERS] refresh materialized view concurrently

2013-06-27 Thread Hitoshi Harada

On Wed, Jun 26, 2013 at 1:38 AM, Kevin Grittner kgri...@ymail.com wrote:

 Hitoshi Harada umi.tan...@gmail.com wrote:

  I spent a few hours to review the patch.

 Thanks!

  As far as I can tell, the overall approach is as follows.
 
  - create a new temp heap as non-concurrent does, but with
  ExclusiveLock on the matview, so that reader wouldn't be blocked

 Non-concurrent creates the heap in the matview's tablespace and
 namespace, so the temp part is different in concurrent
 generation.  This difference is why concurrent can be faster when
 few rows change.


It's still not clear to me why you need temp in concurrent and not in
non-concurrent.  If this type of operations is always creating temp table
and just swap it with existing one, why can't we just make it temp always?
And if the performance is the only concern, is temp better than just
turning off WAL for the table or UNLOGGED table?


 Also, before the next step there is an ANALYZE of the temp table,
 so the planner can make good choices in the next step.

  - with this temp table and the matview, query FULL JOIN and
  extract difference between original matview and temp heap (via SPI)

 Right; into another temp table.

  - this operation requires unique index for performance reason (or
  correctness reason too)

 It is primarily for correctness in the face of duplicate rows which
 have no nulls.  Do you think the reasons need to be better
 documented with comments?


Ah, yes, even after looking at patch I was confused if it was for
performance or correctness.  It's a shame we cannot refresh it concurrently
if we have duplicate rows in the matview.  I thought it would make sense to
allow it without unique key if it was only performance tradeoffs.



 I also modified the confusing error message to something close to
 the suggestion from Robert.

 What to do about the Assert that the matview is not a system
 relation seems like material for a separate patch.  After review,
 I'm inclined to remove the test altogether, so that extensions can
 create matviews in pg_catalog.


I like this better.


 New version attached.


 Will take another look.

Thanks,
-- 
Hitoshi Harada

Re: [HACKERS] refresh materialized view concurrently

2013-06-27 Thread Andres Freund

On 2013-06-27 00:12:07 -0700, Hitoshi Harada wrote:
  Two, until we get MVCC catalog scans, it's not safe to update any
  system catalog tuple without an AccessExclusiveLock on some locktag
  that will prevent concurrent catalog scans for that tuple.  Under
  SnapshotNow semantics, concurrent readers can fail to see that the
  object is present at all, leading to mysterious failures - especially
  if some of the object's catalog scans are seen and others are missed.
 
 
  So what I'm saying above is take AccessExclusiveLock on swapping relfile
 in catalog.  This doesn't violate your statement, I suppose.  I'm actually
 still skeptical about MVCC catalog, because even if you can make catalog
 lookup MVCC, relfile on the filesystem is not MVCC.  If session 1 changes
 relfilenode in pg_class and commit transaction, delete the old relfile from
 the filesystem, but another concurrent session 2 that just took a snapshot
 before 1 made such change keeps running and tries to open this relation,
 grabbing the old relfile and open it from filesystem -- ERROR: relfile not
 found.

We can play cute tricks akin to what CREATE INDEX CONCURRENTLY currently
does, i.e. wait for all other relations that could have possibly seen
the old relfilenode (they must have at least a share lock on the
relation) before dropping the actual storage.

The reason we cannot currently do that in most scenarios is that we
cannot perform transactional/mvcc updates of non-exclusively locked
objects due to the SnapshotNow problems of seeing multiple or no
versions of a row during a single scan.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-06-27 Thread Kevin Grittner

Andres Freund and...@2ndquadrant.com wrote:

 We can play cute tricks akin to what CREATE INDEX CONCURRENTLY currently
 does, i.e. wait for all other relations that could have possibly seen
 the old relfilenode (they must have at least a share lock on the
 relation) before dropping the actual storage.

 The reason we cannot currently do that in most scenarios is that we
 cannot perform transactional/mvcc updates of non-exclusively locked
 objects due to the SnapshotNow problems of seeing multiple or no
 versions of a row during a single scan.

Not only would that be slower than the submitted patch in cases
where only a few rows differ, but it could be waiting to swap in
that new heap for an unbounded amount of time.  I think the current
patch will play nicer with incremental maintenance than what you
suggest here.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-06-27 Thread Kevin Grittner

Hitoshi Harada umi.tan...@gmail.com wrote:
 Kevin Grittner kgri...@ymail.com wrote:
 Hitoshi Harada umi.tan...@gmail.com wrote:

 As far as I can tell, the overall approach is as follows.

 - create a new temp heap as non-concurrent does, but with
 ExclusiveLock on the matview, so that reader wouldn't be
 blocked

 Non-concurrent creates the heap in the matview's tablespace and
 namespace, so the temp part is different in concurrent
 generation.  This difference is why concurrent can be faster
 when few rows change.

 It's still not clear to me why you need temp in concurrent and
 not in non-concurrent.

Well, temp tables can be in an entirely different tablespace, so
you can't just move the heap of a temp table into place as the new
heap for the matview (as we do for a non-concurrent REFRESH) -- at
least not without potentially copying everything an extra time.

For concurrent we are modifying the existing matview heap, and the
freshly generated set of values, as well as the diff table, are
just needed, well, temporarily.  That makes the behavior of temp
tables ideal.  Not only are they not logged, they are potentially
placed on faster tablespaces, and the data written to them might
never be written to disk:

http://www.postgresql.org/docs/9.2/interactive/runtime-config-resource.html#GUC-TEMP-BUFFERS

 If this type of operations is always creating temp table and
 just swap it with existing one, why can't we just make it temp
 always?

Because of the potentially differrent tablespaces.

 And if the performance is the only concern, is temp better than
 just turning off WAL for the table or UNLOGGED table?

Yes, it is.

 - this operation requires unique index for performance reason
 (or correctness reason too)

 It is primarily for correctness in the face of duplicate rows
 which have no nulls.  Do you think the reasons need to be better
 documented with comments?

 Ah, yes, even after looking at patch I was confused if it was for
 performance or correctness.

This is the part of the function's comment which attempts to
explain the problem.

 * This join cannot work if there are any
 * duplicated rows in either the old or new versions, in the sense that every
 * column would compare as equal between the two rows.  It does work correctly
 * in the face of rows which have at least one NULL value, with all non-NULL
 * columns equal.  The behavior of NULLs on equality tests and on UNIQUE
 * indexes turns out to be quite convenient here; the tests we need to make
 * are consistent with default behavior.  If there is at least one UNIQUE
 * index on the materialized view, we have exactly the guarantee we need.  By
 * joining based on equality on all columns which are part of any unique
 * index, we identify the rows on which we can use UPDATE without any problem.
 * If any column is NULL in either the old or new version of a row (or both),
 * we must use DELETE and INSERT, since there could be multiple rows which are
 * NOT DISTINCT FROM each other, and we could otherwise end up with the wrong
 * number of occurrences in the updated relation.

I'm open to suggestions on better wording.

As an example of the way the full join gets into trouble with
duplicate rows when used for a diff, see this example:

test=# create table old (c1 int, c2 int);
CREATE TABLE
test=# create table new (c1 int, c2 int);
CREATE TABLE
test=# insert into old values
test-#   (1,1),(1,2),(1,2),(1,2),(1,3),(1,null),(1,null);
INSERT 0 7
test=# insert into new values
test-#   (1,1),(1,2),(1,2),(1,2),(1,2),(1,2),(1,4),(1,null),(1,null),(1,null);
INSERT 0 10

At this point it is clear that we need to add two rows with values
(1,2) and we need to wind up with one more row with values (1,null)
than we already have.  We also need to delete (1,3) and add (1,4). 
But full join logic fails to get things right for the case of
duplicate rows with no nulls:

test=# select old, new
test-#   from old
test-#   full join new on old.c1 = new.c1 and old.c2 = new.c2
test-#   where (old.*) is distinct from (new.*);
  old  |  new  
---+---
 (1,3) | 
 (1,)  | 
 (1,)  | 
   | (1,4)
   | (1,)
   | (1,)
   | (1,)
(7 rows)

 It's a shame we cannot refresh it concurrently if we have
 duplicate rows in the matview.  I thought it would make sense to
 allow it without unique key if it as only performance tradeoffs.

Well, we *could* allow it without a unique index, but the code
would be more complex and significantly slower.  I think we would
still want to handle it the way the current patch does when a
unique index is present, even if we have a way to handle cases
where such an index is not present.  Even if I were convinced it
was worthwhile to support the more general case, I would want to
commit this patch first, and add the more complicated code as a
follow-on patch.

At this point I'm not convinced of the value of supporting
concurrent refresh of materialized views which contain duplicate
rows, nor of the benefit of allowing it to work ten

Re: [HACKERS] refresh materialized view concurrently

2013-06-25 Thread Robert Haas

On Fri, Jun 21, 2013 at 5:20 AM, Hitoshi Harada umi.tan...@gmail.com wrote:
 If I don't miss something, the requirement for the CONCURRENTLY option is to
 allow simple SELECT reader to read the matview concurrently while the view
 is populating the new data, and INSERT/UPDATE/DELETE and SELECT FOR
 UPDATE/SHARE are still blocked.  So, I wonder why it is not possible just to
 acquire ExclusiveLock on the matview while populating the data and swap the
 relfile by taking small AccessExclusiveLock.  This lock escalation is no
 dead lock hazard, I suppose, because concurrent operation would block the
 other at the point ExclusiveLock is acquired, and ExclusiveLock conflicts
 AccessExclusiveLock.  Then you don't need the complicated SPI logic or
 unique key index dependency.

This is no good.  One, all lock upgrades are deadlock hazards.  In
this case, that plays out as follows: suppose that the session running
REFRESH MATERIALIZED VIEW CONCURRENTLY also holds a lock on something
else.  Some other process takes an AccessShareLock on the materialized
view and then tries to take a conflicting lock on the other object.
Kaboom, deadlock.  Granted, the chances of that happening in practice
are small, but it IS the reason why we typically try to having
long-running operations perform lock upgrades.  Users get really
annoyed when their DDL runs for an hour and then rolls back.

Two, until we get MVCC catalog scans, it's not safe to update any
system catalog tuple without an AccessExclusiveLock on some locktag
that will prevent concurrent catalog scans for that tuple.  Under
SnapshotNow semantics, concurrent readers can fail to see that the
object is present at all, leading to mysterious failures - especially
if some of the object's catalog scans are seen and others are missed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-06-21 Thread Hitoshi Harada

On Fri, Jun 14, 2013 at 9:05 AM, Kevin Grittner kgri...@ymail.com wrote:

 Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for
 9.4 CF1.  The goal of this patch is to allow a refresh without
 interfering with concurrent reads, using transactional semantics.


I spent a few hours to review the patch.

As far as I can tell, the overall approach is as follows.

- create a new temp heap as non-concurrent does, but with ExclusiveLock on
the matview, so that reader wouldn't be blocked
- with this temp table and the matview, query FULL JOIN and extract
difference between original matview and temp heap (via SPI)
- this operation requires unique index for performance reason (or
correctness reason too)
- run ANALYZE on this diff table
- run UPDATE, INSERT and DELETE via SPI, to do the merge
- close these temp heap

If I don't miss something, the requirement for the CONCURRENTLY option is
to allow simple SELECT reader to read the matview concurrently while the
view is populating the new data, and INSERT/UPDATE/DELETE and SELECT FOR
UPDATE/SHARE are still blocked.  So, I wonder why it is not possible just
to acquire ExclusiveLock on the matview while populating the data and swap
the relfile by taking small AccessExclusiveLock.  This lock escalation is
no dead lock hazard, I suppose, because concurrent operation would block
the other at the point ExclusiveLock is acquired, and ExclusiveLock
conflicts AccessExclusiveLock.  Then you don't need the complicated SPI
logic or unique key index dependency.

Assuming I'm asking something wrong and going for the current approach,
here's what I've found in the patch.

- I'm not sure if unique key index requirement is reasonable or not,
because users only add CONCURRENTLY option and not merging or incremental
update.
- This could be an overflow in diffname buffer.
+ quoteRelationName(tempname, tempRel);
+ strcpy(diffname, tempname);
+ strcpy(diffname + strlen(diffname) - 1, _2\);
- As others pointed out, quoteOneName can be replaced with quote_identifier
- This looks harmless, but I wonder if you need to change relkind.

*** 665,672  make_new_heap(Oid OIDOldHeap, Oid NewTableSpace)
OldHeap-rd_rel-relowner,
OldHeapDesc,
NIL,
!   OldHeap-rd_rel-relkind,
!   OldHeap-rd_rel-relpersistence,
false,
RelationIsMapped(OldHeap),
true,
--- 680,687 
OldHeap-rd_rel-relowner,
OldHeapDesc,
NIL,
!   RELKIND_RELATION,
!   relpersistence,
false,
RelationIsMapped(OldHeap),
true,

Since OldHeap-rd_rel-relkind has been working with 'm', too, not only 'r'?
- I found two additional parameters on make_new_heap ugly, but couldn't
come up with better solution.  Maybe we can pass Relation of old heap to
the function instead of Oid..

That's about it from me.


Thanks,
-- 
Hitoshi Harada

Re: [HACKERS] refresh materialized view concurrently

2013-06-21 Thread Hitoshi Harada

On Fri, Jun 21, 2013 at 2:20 AM, Hitoshi Harada umi.tan...@gmail.comwrote:




 On Fri, Jun 14, 2013 at 9:05 AM, Kevin Grittner kgri...@ymail.com wrote:

 Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for
 9.4 CF1.  The goal of this patch is to allow a refresh without
 interfering with concurrent reads, using transactional semantics.


 I spent a few hours to review the patch.


Oh, BTW, though it is not part of this patch, but I came across this.

! /*
!  * We're not using materialized views in the system catalogs.
!  */
  Assert(!IsSystemRelation(matviewRel));

Of course we don't have builtin matview on system catalog, but it is
possible to create such one by allow_system_table_mods=true, so Assert
doesn't look like good to me.

Thanks,
-- 
Hitoshi Harada

Re: [HACKERS] refresh materialized view concurrently

2013-06-21 Thread Andres Freund

On 2013-06-21 02:43:23 -0700, Hitoshi Harada wrote:
 On Fri, Jun 21, 2013 at 2:20 AM, Hitoshi Harada umi.tan...@gmail.comwrote:
 
 
 
 
  On Fri, Jun 14, 2013 at 9:05 AM, Kevin Grittner kgri...@ymail.com wrote:
 
  Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for
  9.4 CF1.  The goal of this patch is to allow a refresh without
  interfering with concurrent reads, using transactional semantics.
 
 
  I spent a few hours to review the patch.
 
 
 Oh, BTW, though it is not part of this patch, but I came across this.
 
 ! /*
 !  * We're not using materialized views in the system catalogs.
 !  */
   Assert(!IsSystemRelation(matviewRel));
 
 Of course we don't have builtin matview on system catalog, but it is
 possible to create such one by allow_system_table_mods=true, so Assert
 doesn't look like good to me.

Anybody using allow_system_table_mods gets to keep the pieces. There are
so many ways to break just about everything things using it that I don't
think worrying much makes sense.
If you want you could replace that by an elog(), but...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-06-17 Thread Simon Riggs

On 17 June 2013 00:43, Kevin Grittner kgri...@ymail.com wrote:

 Especially when one is known to be better than the other already.

 What is the hypothetical technique you're arguing is inferior?  For
 my own part, I haven't gotten beyond the phase of knowing that to
 meet all requests for the feature, it would need to be available at
 about the same point that AFTER EACH STATEMENT triggers fire, but
 that it should not involve any user-written triggers.  Have you
 implemented something similar to what you think I might be
 considering?  Do you have benchmark results?  Can you share
 details?

Recording the changeset required by replication is known to be more
efficient using WAL based extraction than using triggers. WAL writes
are effectively free and using WAL concentrates the reads to avoid
random I/O in large databases. That would be the most suitable
approach for continuously updated matviews, or frequently updates.

Extraction using multiple snapshots is also possible, using a
technique similar to concurrently mechanism. That would require
re-scanning the whole table which might be overkill depending upon the
number of changes. That would work for reasonably infrequent updates.

 Given that we also want to do concurrent CLUSTER and ALTER TABLE
 ... SET TABLESPACE using changeset extraction I think its time
 that discussion happened on hackers.

 No objections to that here; but please don't hijack this thread for
 that discussion.

There are multiple features all requiring efficient change set
extraction. It seems extremely relevant to begin discussing what that
mechanism might be in each case, so we don't develop 2 or even 3
different ones while everybody ignores each other. As you said, we
should be helping each other and working together, and I agree.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-06-17 Thread Heikki Linnakangas


On 14.06.2013 19:05, Kevin Grittner wrote:

Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for
9.4 CF1.  The goal of this patch is to allow a refresh without
interfering with concurrent reads, using transactional semantics.

It is my hope to get this committed during this CF to allow me to
focus on incremental maintenance for the rest of the release cycle.


I must say this seems a bit pointless on its own. But if it's a stepping 
stone to incremental maintenance, I have no objections.



I didn't need to touch very much outside of matview-specific files
for this.  My biggest concern is that I needed two small functions
which did *exactly* what some static functions in ri_triggers.c
were doing and couldn't see where the best place to share them from
was.  For the moment I just duplicated them, but my hope would be
that they could be put in a suitable location and called from both
places, rather than duplicating the 30-some lines of code.  The
function signatures are:

void quoteOneName(char *buffer, const char *name)
void quoteRelationName(char *buffer, Relation rel)


I'd just use quote_identifier and quote_qualified_identifier instead.

I didn't understand this error message:

+   if (!foundUniqueIndex)
+   ereport(ERROR,
+   
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+  errmsg(concurrent refresh requires a unique index on just 
columns for all rows of the materialized view)));

+

What does that mean?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-06-17 Thread Kevin Grittner

Simon Riggs si...@2ndquadrant.com wrote:

 There are multiple features all requiring efficient change set
 extraction. It seems extremely relevant to begin discussing what
 that mechanism might be in each case

Changeset extraction has nothing to do with this patch, and cannot
possibly be useful for it.  Please keep discussion which is
completely unrelated to this patch off this thread.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-06-17 Thread Simon Riggs

On 17 June 2013 12:13, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 14.06.2013 19:05, Kevin Grittner wrote:

 Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for
 9.4 CF1.  The goal of this patch is to allow a refresh without
 interfering with concurrent reads, using transactional semantics.

 It is my hope to get this committed during this CF to allow me to
 focus on incremental maintenance for the rest of the release cycle.


 I must say this seems a bit pointless on its own. But if it's a stepping
 stone to incremental maintenance, I have no objections.

There are generally 4 kinds of mat view

1. Transactionally updated
2. Incremental update, eventually consistent
3. Incremental update, regular refresh
4. Full refresh

At the moment we only have type 4 and it holds a full lock while it
runs. We definitely need a CONCURRENTLY option and this is it.

Implementing the other types won't invalidate what we currently have,
so this makes sense to me.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-06-17 Thread Simon Riggs

On 17 June 2013 13:15, Kevin Grittner kgri...@ymail.com wrote:
 Simon Riggs si...@2ndquadrant.com wrote:

 There are multiple features all requiring efficient change set
 extraction. It seems extremely relevant to begin discussing what
 that mechanism might be in each case

 Changeset extraction has nothing to do with this patch, and cannot
 possibly be useful for it.  Please keep discussion which is
 completely unrelated to this patch off this thread.

Kevin,

You mentioned incremental maintenance in your original post and I
have been discussing it. Had you not mentioned it, I doubt I would
have thought of it.

But since you did mention it, and its clearly an important issue, it
seems relevant to have discussed it here and now.

I'm happy to wait for you to start the thread discussing it directly though.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-06-17 Thread Kevin Grittner

Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 14.06.2013 19:05, Kevin Grittner wrote:
 Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY
 for 9.4 CF1.  The goal of this patch is to allow a refresh
 without interfering with concurrent reads, using transactional
 semantics.

 It is my hope to get this committed during this CF to allow me
 to focus on incremental maintenance for the rest of the release
 cycle.

 I must say this seems a bit pointless on its own.

I completely disagree.  When I read what people were posting about
the materialized view creation that went into 9.3, there were many
comments by people that they can't use it until the materialized
views can be refreshed without blocking readers.  There is a clear
need for this.  It doesn't do much to advance incremental
maintenance, but it is a much smaller patch which will make
matviews usable by a lot of people who can't use the initial
feature set.

 I didn't understand this error message:

 + if (!foundUniqueIndex)
 + ereport(ERROR,
 + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 +  errmsg(concurrent refresh requires a unique index on just 
 columns for all rows of the materialized view)));
 +

 What does that mean?

It means that the REFRESH MATERIALIZED VIEW CONCURRENTLY command
cannot be used on a materialized view unless it has at least one
UNIQUE index which is not partial (i.e., there is no WHERE clause)
and is not indexing on an expression (i.e., the index is entirely
on bare column names).  Set logic to do the diff is hard to get
right if the tables are not proper sets (i.e., they contain
duplicate rows).  I can see at least three ways it *could* be done,
but all of them are much more complex and significantly slower. 
With a UNIQUE index on some set of columns in all rows the correct
guarantees exist to use fast set logic.  It isn't that it's needed
for access; it is needed to provide a guarantee that there is no
row without NULLs that exactly duplicates another row.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-06-17 Thread Kevin Grittner

Simon Riggs si...@2ndquadrant.com wrote:
 Kevin Grittner kgri...@ymail.com wrote:

 Changeset extraction has nothing to do with this patch, and
 cannot possibly be useful for it.  Please keep discussion which
 is completely unrelated to this patch off this thread.

 You mentioned incremental maintenance in your original post and
 I have been discussing it. Had you not mentioned it, I doubt I
 would have thought of it.

 But since you did mention it, and its clearly an important issue,
 it seems relevant to have discussed it here and now.

What I said was that I wanted to get this out of the way before I
started working on incremental maintenance.

 I'm happy to wait for you to start the thread discussing it
 directly though.

Cool.

-Kevin

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-06-17 Thread Nicolas Barbier

2013/6/17 Heikki Linnakangas hlinnakan...@vmware.com:

 +errmsg(concurrent refresh requires a
 unique index on just columns for all rows of the materialized view)));

Maybe my english is failing me here, but I don’t understand the “just” part.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-06-17 Thread Kevin Grittner

Nicolas Barbier nicolas.barb...@gmail.com wrote:
 2013/6/17 Heikki Linnakangas hlinnakan...@vmware.com:


 +    errmsg(concurrent refresh requires a
 unique index on just columns for all rows of the materialized view)));

 Maybe my english is failing me here, but I don’t understand the “just” part.

It means that the index must not use any expressions in the list of
what it's indexing on -- only column names.  Suggestions for better
wording would be welcome.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-06-17 Thread Robert Haas

On Mon, Jun 17, 2013 at 11:21 AM, Kevin Grittner kgri...@ymail.com wrote:
 Nicolas Barbier nicolas.barb...@gmail.com wrote:
 2013/6/17 Heikki Linnakangas hlinnakan...@vmware.com:


 +errmsg(concurrent refresh requires a
 unique index on just columns for all rows of the materialized view)));

 Maybe my english is failing me here, but I don’t understand the “just” part.

 It means that the index must not use any expressions in the list of
 what it's indexing on -- only column names.  Suggestions for better
 wording would be welcome.

Random idea:

ERROR: materialized view \%s\ does not have a unique key

Perhaps augmented with:

HINT: Create a UNIQUE btree index with no WHERE clause on one or more
columns of the materialized view.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-06-17 Thread Josh Berkus

On 06/17/2013 04:13 AM, Heikki Linnakangas wrote:
 On 14.06.2013 19:05, Kevin Grittner wrote:
 Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for
 9.4 CF1.  The goal of this patch is to allow a refresh without
 interfering with concurrent reads, using transactional semantics.

 It is my hope to get this committed during this CF to allow me to
 focus on incremental maintenance for the rest of the release cycle.
 
 I must say this seems a bit pointless on its own. But if it's a stepping
 stone to incremental maintenance, I have no objections.

Actually, CONCURRENTLY was cited as the #1 deficiency for the matview
feature for 9.3.  With it, the use-case for Postgres matviews is
broadened considerably.  So it's very valuable on its own, even if (for
example) INCREMENTAL didn't get done for 9.3.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-06-16 Thread Simon Riggs

On 14 June 2013 17:05, Kevin Grittner kgri...@ymail.com wrote:
 Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for
 9.4 CF1.  The goal of this patch is to allow a refresh without
 interfering with concurrent reads, using transactional semantics.

Is there a reason to keep the non-concurrent behaviour? Anybody that
wants to explicitly lock should just run a LOCK statement. Can't we
treat behaviour when fully locked as an optimisation, so we can just
do the right thing without extra thought and keywords?

 It is my hope to get this committed during this CF to allow me to
 focus on incremental maintenance for the rest of the release cycle.

Incremental maintenance will be very straightforward using the logical
changeset extraction code Andres is working on. Having two parallel
mechanisms for changeset extraction in one release seems like a waste
of time. Especially when one is known to be better than the other
already.

Given that we also want to do concurrent CLUSTER and ALTER TABLE ...
SET TABLESPACE using changeset extraction I think its time that
discussion happened on hackers.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] refresh materialized view concurrently

2013-06-16 Thread Kevin Grittner

Simon Riggs si...@2ndquadrant.com wrote:
 On 14 June 2013 17:05, Kevin Grittner kgri...@ymail.com wrote:
 Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY
 for 9.4 CF1.  The goal of this patch is to allow a refresh
 without interfering with concurrent reads, using transactional
 semantics.

 Is there a reason to keep the non-concurrent behaviour?

Yes.  For initial population, truncation, and replacement of
contents when more than a small percentage of rows are affected.

 Anybody that wants to explicitly lock should just run a LOCK
 statement. Can't we treat behaviour when fully locked as an
 optimisation, so we can just do the right thing without extra
 thought and keywords?

Are you suggesting that we use heap replacement or DML depending on
what heavyweight locks held when the statement is executed?

 It is my hope to get this committed during this CF to allow me
 to focus on incremental maintenance for the rest of the release
 cycle.

 Incremental maintenance will be very straightforward using the
 logical changeset extraction code Andres is working on.

At most, changeset extraction will help with obtaining the initial
delta for the base relations, which is less than 5% of what needs
doing for incremental maintenance of materialized views.  If it
looks like a good fit, of course I'll use it.

 Having two parallel mechanisms for changeset extraction in one
 release seems like a waste of time.

I haven't looked in depth at what technique to use for capturing
the base relation deltas.  The changeset extraction technique is
something to consider, for sure.  I have a lot of work left to see
whether it works for this.  In particular, to handle all requested
timings, it would need to have low enough latency to provide a
delta during the completion of each DML statement, to support
requests for eager maintenance of a materialized view -- where
the transaction which just changed the base relation would see the
effect if they queried the matview.  That may not be the something
to try to tackle in this release, but there are people who want it,
and I would prefer to pick a technique which didn't have a latency
high enough to make that impractical.  That's not to say that I
know that to be a problem for using the changeset extraction
technique for this -- just that I haven't gotten to the point of
evaluating that.

 Especially when one is known to be better than the other already.

What is the hypothetical technique you're arguing is inferior?  For
my own part, I haven't gotten beyond the phase of knowing that to
meet all requests for the feature, it would need to be available at
about the same point that AFTER EACH STATEMENT triggers fire, but
that it should not involve any user-written triggers.  Have you
implemented something similar to what you think I might be
considering?  Do you have benchmark results?  Can you share
details?

 Given that we also want to do concurrent CLUSTER and ALTER TABLE
 ... SET TABLESPACE using changeset extraction I think its time
 that discussion happened on hackers.

No objections to that here; but please don't hijack this thread for
that discussion.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

38 matches

Mail list logo